Planet Linux Australia

,

sthbrx - a POWER technical blogDumb bugs: When a date breaks booting the kernel

The setup

I've recently been working on internal CI infrastructure for testing kernels before sending them to the mailing list. As part of this effort, I became interested in reproducible builds. Minimising the changing parts outside of the source tree itself could improve consistency and ccache hits, which is great for trying to make the CI faster and more reproducible across different machines. This means removing 'external' factors like timestamps from the build process, because the time changes every build and means the results between builds of the same tree are no longer identical binaries. This also prevents using previously cached results, potentially slowing down builds (though it turns out the kernel does a good job of limiting the scope of where timestamps appear in the build).

As part of this effort, I came across the KBUILD_BUILD_TIMESTAMP environment variable. This variable is used to set the kernel timestamp, which is primarily for any users who want to know when their kernel was built. That's mostly irrelevant for our work, so an easy KBUILD_BUILD_TIMESTAMP=0 later and... it still uses the current date.

Ok, checking the documentation it says

Setting this to a date string overrides the timestamp used in the UTS_VERSION definition (uname -v in the running kernel). The value has to be a string that can be passed to date -d. The default value is the output of the date command at one point during build.

So it looks like the timestamp variable is actually expected to be a date format. To make it obvious that it's not a 'real' date, let's set KBUILD_BUILD_TIMESTAMP=0000-01-01. A bunch of zeroes (and the ones to make it a valid month and day) should tip off anyone to the fact it's invalid.

As an aside, this is a different date to what I tried to set it to earlier; a 'timestamp' typically refers to the number of seconds since the UNIX epoch (1970), so my first attempt would have corresponded to 1970-01-01. But given we're passing a date, not a timestamp, there should be no problem setting it back to the year 0. And I like the aesthetics of 0000 over 1970.

Building and booting the kernel, we see #1 SMP 0000-01-01 printed as the build timestamp. Success! After confirming everything works, I set the environment variable in the CI jobs and call it a day.

An unexpected error

A few days later I need to run the CI to test my patches, and something strange happens. It builds fine, but the boot tests that load a root disk image fail inexplicably: there is a kernel panic saying "VFS: Unable to mount root fs on unknown-block(253,2)".

[    0.909648][    T1] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(253,2)
[    0.909797][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc2-g065ffaee7389 #8
[    0.909880][    T1] Hardware name: IBM pSeries (emulated by qemu) POWER8 (raw) 0x4d0200 0xf000004 of:SLOF,HEAD pSeries
[    0.910044][    T1] Call Trace:
[    0.910107][    T1] [c000000003643b00] [c000000000fb6f9c] dump_stack_lvl+0x70/0xa0 (unreliable)
[    0.910378][    T1] [c000000003643b30] [c000000000144e34] panic+0x178/0x424
[    0.910423][    T1] [c000000003643bd0] [c000000002005144] mount_block_root+0x1d0/0x2bc
[    0.910457][    T1] [c000000003643ca0] [c000000002005720] prepare_namespace+0x1d4/0x22c
[    0.910487][    T1] [c000000003643d20] [c000000002004b04] kernel_init_freeable+0x36c/0x3bc
[    0.910517][    T1] [c000000003643df0] [c000000000013830] kernel_init+0x30/0x1a0
[    0.910549][    T1] [c000000003643e50] [c00000000000df94] ret_from_kernel_thread+0x5c/0x64
[    0.910587][    T1] --- interrupt: 0 at 0x0
[    0.910794][    T1] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
[    0.910828][    T1] REGS: c000000003643e80 TRAP: 0000   Not tainted  (6.3.0-rc2-g065ffaee7389)
[    0.910883][    T1] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
[    0.910990][    T1] CFAR: 0000000000000000 IRQMASK: 0
[    0.910990][    T1] GPR00: 0000000000000000 c000000003644000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR12: 0000000000000000 0000000000000000 c000000000013808 0000000000000000
[    0.910990][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.911371][    T1] NIP [0000000000000000] 0x0
[    0.911397][    T1] LR [0000000000000000] 0x0
[    0.911427][    T1] --- interrupt: 0
qemu-system-ppc64: OS terminated: OS panic: VFS: Unable to mount root fs on unknown-block(253,2)

Above the panic was some more context, saying

[    0.906194][    T1] Warning: unable to open an initial console.
...
[    0.908321][    T1] VFS: Cannot open root device "vda2" or unknown-block(253,2): error -2
[    0.908356][    T1] Please append a correct "root=" boot option; here are the available partitions:
[    0.908528][    T1] 0100           65536 ram0
[    0.908657][    T1]  (driver?)
[    0.908735][    T1] 0101           65536 ram1
[    0.908744][    T1]  (driver?)
...
[    0.909216][    T1] 010f           65536 ram15
[    0.909226][    T1]  (driver?)
[    0.909265][    T1] fd00         5242880 vda
[    0.909282][    T1]  driver: virtio_blk
[    0.909335][    T1]   fd01            4096 vda1 d1f35394-01
[    0.909364][    T1]
[    0.909401][    T1]   fd02         5237760 vda2 d1f35394-02
[    0.909408][    T1]
[    0.909441][    T1] fd10             366 vdb
[    0.909446][    T1]  driver: virtio_blk
[    0.909479][    T1] 0b00         1048575 sr0
[    0.909486][    T1]  driver: sr

This is even more baffling: if it's unable to open a console, then what am I reading these messages on? And error -2, or ENOENT, on opening 'vda2' implies that no such file or directory exists. But it then lists vda2 as a present drive with a known driver? So is vda2 missing or not?

Living in denial

As you've read the title of this article, you can probably guess as to what changed to cause this error. But at the time I had no idea what could have been the cause. I'd already confirmed that a kernel with a set timestamp can boot to userspace, and there was another (seemingly) far more likely candidate for the failure: as part of the CI design, patches are extracted from the submitted branch and rebased onto the maintainer's tree. This is great from a convenience perspective, because you don't need to worry about forgetting to rebase your patches before testing and submission. But if the maintainer has synced their branch with Linus' tree it means there could be a lot of things changed in the source tree between runs, even if they were only a few days apart.

So, when you're faced with a working test on one commit and a broken test on another commit, it's time to break out the git bisect. Downloading the kernel images from the relevant CI jobs, I confirmed that indeed one was working while the other was broken. So I bisected the relevant commits, and... everything kept working. Each step I would build and boot the kernel, and each step would reach userspace just fine. I was getting suspicious at this point, so skipped ahead to the known bad commit and built and tested it locally. It also worked.

This was highly confusing, because it meant there was something fishy going on. Some kind of state outside of the kernel tree. Could it be... surely not...

Comparing the boot logs of the two CI kernels, I see that the working one indeed uses an actual timestamp, and the broken one uses the 0000-01-01 fixed date. Oh no. Setting the timestamp with a local build, I can now reproduce the boot panic with a kernel I built myself.

But... why?

OK, so it's obvious at this point that the timestamp is affecting loading a root disk somehow. But why? The obvious answer is that it's before the UNIX epoch. Something in the build process is turning the date into an actual timestamp, and going wrong when that timestamp gets used for something.

But it's not like there was a build error complaining about it. As best I could tell, the kernel doesn't try to parse the date anywhere, besides passing it to date during the build. And if date had an issue with it, it would have broken the build. Not booting the kernel. There's no date utility being invoked during kernel boot!

Regardless, I set about tracing the usage of KBUILD_BUILD_TIMESTAMP inside the kernel. The stacktrace in the panic gave the end point of the search; the function mount_block_root() wasn't happy. So all I had to do was work out at which point mount_block_root() tried to access the KBUILD_BUILD_TIMESTAMP value.

In short, that went nowhere.

mount_block_root() effectively just tries to open a file in the filesystem. There's massive amounts of code handling this, and any part could have had the undocumented dependency on KBUILD_BUILD_TIMESTAMP. Approaching from the other direction, KBUILD_BUILD_TIMESTAMP is turned into build-timestamp inside a Makefile, which is in turn related to a file include/generated/utsversion.h. This file #defines UTS_VERSION equal to the KBUILD_BUILD_TIMESTAMP value. Searching the kernel for UTS_VERSION, we hit init/version-timestamp.c which stores it in a struct with other build information:

struct uts_namespace init_uts_ns = {
    .ns.count = REFCOUNT_INIT(2),
    .name = {
        .sysname    = UTS_SYSNAME,
        .nodename   = UTS_NODENAME,
        .release    = UTS_RELEASE,
        .version    = UTS_VERSION,
        .machine    = UTS_MACHINE,
        .domainname = UTS_DOMAINNAME,
    },
    .user_ns = &init_user_ns,
    .ns.inum = PROC_UTS_INIT_INO,
#ifdef CONFIG_UTS_NS
    .ns.ops = &utsns_operations,
#endif
};

This is where the trail goes cold: I don't know if you've ever tried this, but searching for .version in the kernel's codebase is not a very fruitful endeavor when you're interested in a specific kind of version.

$ rg "(\.|\->)version\b" | wc -l
5718

I tried tracing the usage of init_uts_ns, but didn't get very far.

By now I'd already posted this in chat and another developer, Joel Stanley, was also investigating this bizarre bug. They had been testing different timestamp values and made the horrifying discovery that the bug sticks around after a rebuild. So you could start with a broken build, set the timestamp back to the correct value, rebuild, and the resulting kernel would still be broken. The boot log would report the correct time, but the root disk mounter panicked all the same.

Getting sidetracked

I wasn't prepared to investigate the boot panic directly until the persistence bug was fixed. Having to run make clean and rebuild everything would take an annoyingly long time, even with ccache. Fortunately, I had a plan. All I had to do was work out which generated files are different between a broken and working build, and binary search by deleting half of them until deleting only one made the difference between the bug persisting or not. We can use diff for this. Running the initial diff we get

$ diff -q --exclude System.map --exclude .tmp_vmlinux* --exclude tools broken/ working/
Common subdirectories: broken/arch and working/arch
Common subdirectories: broken/block and working/block
Files broken/built-in.a and working/built-in.a differ
Common subdirectories: broken/certs and working/certs
Common subdirectories: broken/crypto and working/crypto
Common subdirectories: broken/drivers and working/drivers
Common subdirectories: broken/fs and working/fs
Common subdirectories: broken/include and working/include
Common subdirectories: broken/init and working/init
Common subdirectories: broken/io_uring and working/io_uring
Common subdirectories: broken/ipc and working/ipc
Common subdirectories: broken/kernel and working/kernel
Common subdirectories: broken/lib and working/lib
Common subdirectories: broken/mm and working/mm
Common subdirectories: broken/net and working/net
Common subdirectories: broken/scripts and working/scripts
Common subdirectories: broken/security and working/security
Common subdirectories: broken/sound and working/sound
Common subdirectories: broken/usr and working/usr
Files broken/.version and working/.version differ
Common subdirectories: broken/virt and working/virt
Files broken/vmlinux and working/vmlinux differ
Files broken/vmlinux.a and working/vmlinux.a differ
Files broken/vmlinux.o and working/vmlinux.o differ
Files broken/vmlinux.strip.gz and working/vmlinux.strip.gz differ

Hmm, OK so only some top level files are different. Deleting all the different files doesn't fix the persistence bug though, and I know that a proper make clean does fix it, so what could possibly be the difference when all the remaining files are identical?

Oh wait. man diff reports that diff only compares the top level folder entries by default. So it was literally just telling me "yes, both the broken and working builds have a folder named X". How GNU of it. Re-running the diff command with actually useful options, we get a more promising story

$ diff -qr --exclude System.map --exclude .tmp_vmlinux* --exclude tools build/broken/ build/working/
Files build/broken/arch/powerpc/boot/zImage and build/working/arch/powerpc/boot/zImage differ
Files build/broken/arch/powerpc/boot/zImage.epapr and build/working/arch/powerpc/boot/zImage.epapr differ
Files build/broken/arch/powerpc/boot/zImage.pseries and build/working/arch/powerpc/boot/zImage.pseries differ
Files build/broken/built-in.a and build/working/built-in.a differ
Files build/broken/include/generated/utsversion.h and build/working/include/generated/utsversion.h differ
Files build/broken/init/built-in.a and build/working/init/built-in.a differ
Files build/broken/init/utsversion-tmp.h and build/working/init/utsversion-tmp.h differ
Files build/broken/init/version.o and build/working/init/version.o differ
Files build/broken/init/version-timestamp.o and build/working/init/version-timestamp.o differ
Files build/broken/usr/built-in.a and build/working/usr/built-in.a differ
Files build/broken/usr/initramfs_data.cpio and build/working/usr/initramfs_data.cpio differ
Files build/broken/usr/initramfs_data.o and build/working/usr/initramfs_data.o differ
Files build/broken/usr/initramfs_inc_data and build/working/usr/initramfs_inc_data differ
Files build/broken/.version and build/working/.version differ
Files build/broken/vmlinux and build/working/vmlinux differ
Files build/broken/vmlinux.a and build/working/vmlinux.a differ
Files build/broken/vmlinux.o and build/working/vmlinux.o differ
Files build/broken/vmlinux.strip.gz and build/working/vmlinux.strip.gz differ

There are some new entries here: notably init/version* and usr/initramfs*. Binary searching these files results in a single culprit: usr/initramfs_data.cpio. This is quite fitting, as the .cpio file is an archive defining a filesystem layout, much like .tar files. This file is actually embedded into the kernel image, and loaded as a bare-bones shim filesystem when the user doesn't provide their own initramfs1.

So it would make sense that if the CPIO archive wasn't being rebuilt, then the initial filesystem wouldn't change. And it would make sense for the initial filesystem to be causing mount issues of the proper root disk filesystem.

This just leaves the question of how KBUILD_BUILD_TIMESTAMP is breaking the CPIO archive. And it's around this time that a third developer, Andrew, who I'd roped into this bug hunt for having the (mis)fortune to sit next to me, pointed out that the generator script for this CPIO archive was passing the KBUILD_BUILD_TIMESTAMP to date. Whoop, we've found the murder weapon2!

The persistence bug could be explained now: because the script was only using KBUILD_BUILD_TIMESTAMP internally, make had no way of knowing that the archive generation depended on this variable. So even when I changed the variable to a valid value, make didn't know to rebuild the corrupt archive. Let's now get back to the main issue: why boot panics.

Solving the case

Following along the CPIO generation script, the KBUILD_BUILD_TIMESTAMP variable is turned into a timestamp by date -d"$KBUILD_BUILD_TIMESTAMP" +%s. Testing this in the shell with 0000-01-01 we get this (somewhat amusing, but also painful) result

date -d"$KBUILD_BUILD_TIMESTAMP" +%s
-62167255492

This timestamp is then passed to a C program that assigns it to a variable default_mtime. Looking over the source, it seems this variable is used to set the mtime field on the files in the CPIO archive. The timestamp is stored as a time_t, which is an alias for int64_t. That's 64 bits of data, up to 16 hexadecimal characters. And yes, that's relevant: CPIO stores the mtime (and all other numerical fields) as 32 bit unsigned integers represented by ASCII hexadecimal characters. The sprintf() call that ultimately embeds the timestamp uses the %08lX format specifier. This formats a long as hexadecimal, padded to at least 8 characters. Hang on... at least 8 characters? What if our timestamp happens to be more?

It turns out that large timestamps are already guarded against. The program will error during build if the date is later than 2106-02-07 (maximum unsigned 8 hex digit timestamp).

/*
 * Timestamps after 2106-02-07 06:28:15 UTC have an ascii hex time_t
 * representation that exceeds 8 chars and breaks the cpio header
 * specification.
 */
if (default_mtime > 0xffffffff) {
    fprintf(stderr, "ERROR: Timestamp too large for cpio format\n");
    exit(1);
}

But we are using an int64_t. What would happen if one were to provide a negative timestamp?

Well, sprintf() happily spits out FFFFFFF1868AF63C when we pass in our negative timestamp representing 0000-01-01. That's 16 characters, 8 too many for the CPIO header3.

So at last we've found the cause of the panic: the timestamp is being formatted too long, which breaks the CPIO header and the kernel doesn't create an initial filesystem correctly. This includes the /dev folder (which surprisingly is not hardcoded into kernel, but must be declared by the initramfs). So when the root disk mounter tries to open /dev/vda2, it correctly complains that it failed to create a device in the non-existent /dev.

Postmortem

After discovering all this, I sent in a couple of patches to fix the CPIO generation and rebuild logic. They were not complicated fixes, but wow were they time consuming to track down. I didn't see the error initially because I typically only boot with my own initramfs over the embedded one, and not with the intent to load a root disk. Then the panic itself was quite far away from the real issue, and there were many dead ends to explore.

I also got curious as to why the kernel didn't complain about a corrupt initramfs earlier. A brief investigation showed a streaming parser that is extremely fault tolerant, silently skipping invalid entries (like ones missing or having too long a name). The corrupted header was being interpreted as an entry with an empty name and 2 gigabyte body contents, which meant that (1) the kernel skipped inserting it due to the empty name, and (2) the kernel skipped the rest of the initramfs because it thought that up to 2 GB of the remaining content was part of that first entry.

Perhaps this could be improved to require that all input is consumed without unexpected EOF, such as how the userspace cpio tool works (which, by the way, recognises the corrupt archive as such and refuses to decompress it). The parsing logic is mostly from the before-times though (i.e., pre initial git commit), so it's difficult to distinguish intentional leniency and bugs.

Afterword

Incidentally, in investigating this I came across another bug. There is a helper function panic_show_mem() in the initramfs that's meant to dump memory information and then call panic(). It takes in standard printf() style format string and arguments, and tries to forward them to panic() which ultimately prints them.

static void panic_show_mem(const char *fmt, ...)
{
    va_list args;

    show_mem(0, NULL);
    va_start(args, fmt);
    panic(fmt, args);
    va_end(args);
}

void panic(const char *fmt, ...);

But variadic arguments don't quite work this way: instead of forwarding the list args as intended, panic() will instead interpret args as a single argument for the format string fmt. Standard library functions address this by providing v* variants of printf() and friends. For example,

int printf(char *fmt, ...);

int vprintf(char *fmt, va_list args);

We might create a vpanic() function in the kernel that follows this style, but it seems easier to just make panic_show_mem() a macro and 'forward' the arguments in the source code

#define panic_show_mem(fmt, ...) \
    ({ show_mem(0, NULL); panic(fmt, ##__VA_ARGS__); })

Patch sent.

And that's where I've left things. Big thanks to Joel and Andrew for helping me with this bug. It was certainly a trip.


  1. initramfs, or initrd for the older format, are specific kinds of CPIO archives. The initramfs is intended to be loaded as the initial filesystem of a booted kernel, typically in preparation for loading your normal root filesystem. It might contain modules necessary to mount the disk for example. 

  2. Hindsight again would suggest it was obvious to look here because it shows up when searching for KBUILD_BUILD_TIMESTAMP. I unfortunately wasn't familiar with the usr/ source folder initially, and focused on the core kernel components too much earlier. Oh well, we found it eventually. 

  3. I almost missed this initially. Thanks to the ASCII header format, strings was able to print the headers without any CPIO specific tooling. I did a double take when I noticed the headers for the broken CPIO were a little longer than the headers in the working one. 

Michael StillMinor questions in Linux file semantics

I’ve known for a long time that if you delete a file on Unix / Linux but that file is open somewhere, the blocks used by the file aren’t freed until that user closes the file (or is terminated), but I was left wondering about some other edge cases.

Shaken Fist has a distributed blob store. It also has a cache of images that virtual machines are using. If the blob store and the image cache are on the same filesystem, sometimes the image cache entry can be a hard link to an entry in the blob store (for example, if the entry in the blob store doesn’t need to be transcoded before use by the virtual machine). However, if they are on different file systems, I instead use a symbolic link.

This raises questions — what happens if you rename a file which is open for writing in a program? What happens if you change a symbolic link to point somewhere else while it is open? I suspect in both cases the right thing happens, but I decided I should test these theories out.

First off, let’s cover the moving a file which is being written to case. Specifically, moving the file on the same filesystem. I wrote this little test program:

#!/usr/bin/python3

import datetime
import time

with open('a', 'w') as f:
    try:
        while True:
            f.write('%s\n' % datetime.datetime.now())
            time.sleep(1)

    except KeyboardInterrupt:
        f.close()

In one terminal I set it running. In another I then renamed ‘a’ to ‘b’ and waited a bit. The short answer? The newer writes from my script ended up in ‘b’ correctly. This makes sense when you remember that files don’t have names in most Unix filesystems — a directory has dirents with names, and they point to an inode. The open program is changing the content of an inode and associated blocks, and that’s quite separate from changing the dirent that points to that inode.

Secondly, what happens if I have a symlink to a different filesystem, move the file on that other filesystem and then update the symlink? All of course while the file is in use?

Unsurprisingly it works just like the previous example — the open file continues to be updated regardless of the move and the change of symlink.

This is good, because it makes re-sharding the blob store in Shaken Fist much easier. So there you go.

,

Michael StillMalware Analyst’s Cookbook and DVD

Another technical book, this time because my employer lets me buy random technical books as long as I pinky swear to read them and this one sounded interesting and got good reviews.

First off, the book is a bit dated given its from 2011 — there are lots of references to Ubuntu 10.10 for example and they say to avoid Python 3, which has its historical charm. This is unfortunate given the first section of the book talks about setting up honeypots to collect malware to examine, but Dionaea for example had its last commit in 2021. I am left wondering if there are more modern honey pot systems that people use these days.

Secondly the book is definitely a cookbook and that’s on me for not noticing this about the book before buying it — its a series of recipes / scripts that do interesting things with malware. That said, it isn’t really teaching a cohesive set of skills, its more of a series of stepping stones along the path you might follow. I think that’s an unintended piece of important learning — books with “cookbook” or “recipes” in their title probably aren’t very good as an overview of a topic area. My bad.

That said, some parts of the book are very good — the discussion of whois, DNS, and Real Time Black Lists (RTBLs) is helpful and less focussed on providing scripts you could run. The discussion of how to log changes to a Windows system, detect attempts to hide files in NTFS filesystems, and detect changes to registry hives were interesting in an abstract way, but perhaps obvious to someone who actually uses Windows.

Overall, I’m a bit disappointed in this book and it will be exhiled to a shelf at the office as a punishment.

Malware Analyst's Cookbook and DVD Book Cover Malware Analyst's Cookbook and DVD
Michael Ligh, Steven Adair, Blake Hartstein, Matthew Richard,
Computers
John Wiley & Sons
November 2, 2010
747

A computer forensics "how-to" for fighting malicious code and analyzing incidents. With our ever-increasing reliance on computers comes an ever-growing risk of malware. Security professionals will find plenty of solutions in this book to the problems posed by viruses, Trojan horses, worms, spyware, rootkits, adware, and other invasive software. Written by well-known malware experts, this guide reveals solutions to numerous problems and includes a DVD of custom programs and tools that illustrate the concepts, enhancing your skills. Security professionals face a constant battle against malicious software; this practical manual will improve your analytical capabilities and provide dozens of valuable and innovative solutions Covers classifying malware, packing and unpacking, dynamic malware analysis, decoding and decrypting, rootkit detection, memory forensics, open source malware research, and much more.

,

Michael StillThe BeyondCorp papers

Google’s BeyondCorp effort would probably be what we would now call Zero Trust, although I am surprised by how little name recognition BeyondCorp has when I talk to security people about Zero Trust. Perhaps there are subtle differences between the two, but if there are they aren’t obvious to me. I find myself reading the relevant Usenix papers for BeyondCorp, so I figure I’ll post a summary of what I got from each paper here.

The earliest of these papers are quite old now (2014), especially for something the rest of the industry is only starting to talk a lot about at the moment. I wonder if there is a viable business model in watching what papers megacorps like Google publish, and the implementing them as commercialized products before the rest of the market catches on?

Either way, here’s a summary of the various papers from the perspective of an interested bystander…

BeyondCorp: a new approach to enterprise security is an introductory paper that introduces the idea of what we would now call Zero Trust networks. That is, that the internal corporate network is not categorized as especially trusted, but instead serves as an access mechanism to services which define their own trust of an end user. This trust is enforced by access gateways, and derived from metrics such as how recently OS updates have been installed on the requesting device. This is a good introduction to the concept, especially given its age.

BeyondCorp: design to deployment at Google — unfortunately this paper was less useful I think. It is higher level than the first paper, and provides fewer actionable insights for someone thinking of implementing Zero Trust.

BeyondCorp: the access proxy describes the high level architecture of the access proxy, which is the frontend which takes requests from clients and authenticates / authorizes them before passing them onto the protected services. There aren’t a lot of surprises here, but it is a good overview of what you might encounter along the way (non-HTTP protocols requiring a client side helper for example).

Migrating to BeyondCorp: Maintaining Productivity While Improving Security is a discussion of the process of transitioning the Google network to the new zero trust access methodology while not breaking users’ ability to get things done. This was implemented by partitioning the problem space into smaller more tractable problems, and then transitioning clients to the new non-priviledged VLAN as these problems were solved. A key component of this was an enterprise wide rollout of 802.1x to ensure device identity was well understood. This paper is largely descriptive — while it might provide inspiration to other implementations, it does not provide a complete roadmap, largely because every organization’s legacy applications will differ.

That said, one interesting idea is that the network rules to control traffic were implemented in two places — in the network layer for the new VLAN, but also in an iptables implementation on client machines. This meant that it was easy to add clients in test mode (with the local implementation), but turn it off again if things didn’t work out. It also meant that they could add enforcement in locations where the new VLAN had not yet been deployed.

Another interesting idea is the provisioning of micro-VPNs for harder to convert applications such as those requiring non-HTTP access to network resources. This looks to my modern eyes as a lot like what tailscale does — exposing a single application via a micro-VPN accessed from the client routing table.

BeyondCorp: The User Experience details the gradual reduction in the demand for “traditional” VPN connectivity as users were moved to BeyondCorp, even as users initially expected a more traditional approach. It covers other user support scenarios as well, but most of them are quite Google-specific (for example their loaner laptop program).

BeyondCorp: Building a Healthy Fleet is the final paper in the series and discusses defining the threats your are mitigating by undertaking a Zero Trust approach to network security. In the case of BeyondCorp a large amount of the benefit is derived from enforcing regular updates on the user endpoint fleet, as well as controlling who can access what service based on their business needs.

,

Russell CokerFirebuild

After reading Bálint’s blog post about Firebuild (a compile cache) [1] I decided to give it a go. It’s non-free, the project web site [2] says that it’s free for non-commercial use or commercial trials.

My first attempt at building a Debian package failed due to man-recode using a seccomp() sandbox, I filed Debian bug #1032619 [3] about this (thanks for the quick response Bálint). The solution for me was to edit /etc/firebuild.conf and add man-recode to the dont_intercept list. The new version that’s just been uploaded to Debian fixes it by disabling seccomp() and will presumably allow slightly better performance.

Here are the results of building the refpolicy package with Firebuild, a regular build, the first build with Firebuild (30% slower) and a rebuild with Firebuild that reduced the time by almost 42%.

real    1m32.026s
user    4m20.200s
sys     2m33.324s

real    2m4.111s
user    6m31.769s
sys     3m53.681s

real    0m53.632s
user    1m41.334s
sys     3m36.227s

Next I did a test of building a Linux 6.1.10 kernel with “make bzImage -j18“, here are the results from a normal build, first build with firebuild, and second build. The real time is worse with firebuild for this on my machine. I think that the relative speeds of my CPU (reasonably fast 18 core) and storage (two of the slower NVMe devices in a BTRFS RAID-1) is the cause of the first build being relatively so much slower for “make bzImage” than for building the refpolicy, as the kernel build process involves a lot more data. For the final build I moved ~/.cache/firebuild to a tmpfs (I have 128G of RAM and not much running on my machine at the time of the tests), even then building with firebuild was slightly slower in real time but took significantly less CPU time (user+real being 20mins instead of 36m). I also ran several tests with the kernel source tree on a tmpfs but for unknown reasons those tests each took about 6 minutes. Does firebuild or the Linux kernel build process dislike tmpfs for some reason?

real    2m43.020s
user    31m30.551s
sys     5m15.279s

real    8m49.675s
user    64m11.258s
sys     19m39.016s

real    3m6.858s
user    7m47.556s
sys     9m22.513s

real    2m51.910s
user    10m53.870s
sys     9m21.307s

One thing I noticed from the kernel build tests is that the total CPU time taken by the firebuild process (as reported by ps) was more than 2/3 of the run time and top usually reported it as taking around 75% of a CPU core. It seems to me that the firebuild process itself is a bottleneck on build speed. Building refpolicy without firebuild has an average of 4.5 cores in use while building the kernel haas 13.5. Unless they make a multi-threaded version of firebuild it seems that it won’t give the performance one would hope for from a CPU with 18+ cores. I presume that if I had been running with hyper-threading enabled then firebuild would have been even worse for kernel builds as it would sometimes get on the second thread of a core. It looks like firebuild would perform better on AMD CPUs as they tend to have fewer CPU cores with greater average performance per core so a single CPU core for firebuild will be less limited. I presume that the firebuild developers will make it perform better with large numbers of cores in future, the latest Intel laptop CPUs have 16+ cores and servers with 2*40core CPUs are common.

The performance improvement for refpolicy is significant as a portion of build time, but insignificant in terms of real time. A full build of refpolicy doesn’t take enough time to get a Coke and reducing it doesn’t offer a huge benefit, if Firebuild was available in past years when refpolicy took 20 minutes to build (when DDR2 was the best RAM available) then it would be a different story.

There is some potential to optimise the build of refpolicy for the non-firebuild case. Getting it to average more than 4.5 cores in use when there’s 18 available should be possible, there are a number of shell for loops in the main Makefile and maybe some of them can be replaced by make constructs to allow running in parallel. If it used 7 cores on average then it would be faster in a regular build than it currently is with firebuild and a hot cache. Any advice from make experts would be appreciated.

Russell CokerXmpp Tools

For a while I’ve had my monitoring systems alert me via XMPP (Jabber). To do that I used the sendxmpp command-line program which worked well for it’s basic tasks. I recently noticed that my laptop and workstation which I had upgraded to Debian/Testing weren’t sending messages, I’m not sure when it started as my main monitoring of such machines is to touch a key and see if there’s a response – if I’m not at the keyboard then a failure doesn’t bother me too much.

I’ve filed Debian bug #1032868 [1] about this. As sendxmpp is apparently not supported upstream and we are preparing for a release it could be that the next version of Debian is released without this working (if it’s specific to talking to Prosody) or without sendxmpp (if it fails on all Jabber servers).

I next tested xmppc which doesn’t send messages (gives no error when I have apparently correct parameters and just doesn’t send anything) and doesn’t display any text output for info related commands while not giving error messages or an error return code. I filed Debian bug #1032869 [2] about this.

Currently the only success I’ve found with Debian/Testing for this is with go-sendxmpp. To configure that you setup a file named ~/.config/go-sendxmpp/config with the following contents:

username: JABBER-ID
password: PASSWORD

Go-sendxmpp can take a username and password on the command-line but that’s bad for security as in the absence of SE Linux or other advanced security systems the password can be seen by any user on the same system who runs ps. To send a message run “echo $MESSAGE | go-sendxmpp $ADDR” to send $MESSAGE to $ADDR. It also has the option “go-sendxmpp -l” to listen for incoming messages. I don’t have an immediate need to receive messages from the command-line but it’s handy to have the option.

I probably won’t be able to get a new version of etbemon in Debian for the Bookworm release. So to get go-sendxmpp to work with etbemon you need to edit /usr/lib/mon/alert.d/mailxmpp.alert and change this sendxmpp line to this go-sendxmpp line:

open (XMPP, "| /usr/bin/sendxmpp -a /etc/ssl/certs -t @xmpprec -r $host") ||

open (XMPP, "| /usr/bin/go-sendxmpp @xmpprec") ||

,

Simon LyallAudiobooks – February 2023

The Rules of the Game: Jutland and British Naval Command by Andrew Gordon

A very detailed account of the battle of Jutland and British Navel Culture. So detailed I gave up trying to follow the Audiobook but would work better if printed. 3/5

Wings of War: The World War II Fighter Plane that Saved the Allies and the Believers Who Made It Fly by David Fairbank White

The History of the P-51 Mustang through 3 people. Designer Edgar Schmued, Tommy Hitchcock, the man who fought for its adoption, and Don Blakeslee, an ace who flew it. 3/5

The Tombs of Atuan by Ursula K. Le Guin,

The 2nd Earthsea book. A girl grows up as a high priestess until one day Sparrowhawk comes to rob her temple. 3/5

1491: New Revelations of the Americas Before Columbus by Charles C. Mann

Nominally a history of the people’s of the pre-Columbian Americas. Covers the population, age and sophistication of the civilizations based on recent discoveries. 4/5

Geniuses at War: Bletchley Park, Colossus, and the Dawn of the Digital Age by David A. Price

A short book on the Bletchley Park code-breaking efforts of WW2. A general overview concentrating on a few characters with much left under-covered. 4/5

An Unfinished Life: John F. Kennedy, 1917–1963 by Robert Dallek

Good account of life and Presidency although only single volume and the audiobook is further abridged. Well worth it as a first JFK biography. 4/5

Tomorrow’s People: The Future of Humanity in Ten Numbers by Paul Morland

A review of some demographic trends and what they tell us how the world will look in the future. 4/5

Countdown to Pearl Harbor: The Twelve Days to the Attack by Steve Twomey

A chronicle of why America was unprepared for the Japanese Attack on Pearl Harbor. Detailed but a nice and interesting read. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Russell CokerHyper Threading on the E5-2696v3

I just did some quick tests of hyper-threading on my new E5-2696v3 CPU. I compiled the Linux 6.0.10 kernel with and without hyper-threading enabled. Here’s the times for “make -j36 bzImage” and “make -j36 modules” with HT enabled:

real    2m26.540s
user    55m25.121s
sys     9m56.443s

real    10m57.374s
user    309m21.531s
sys     58m1.070s

Here’s the times for “make -j18 bzImage” and “make -j18 modules” with HT disabled:

real    2m40.501s
user    31m35.295s
sys     5m43.523s

real    11m39.313s
user    170m46.840s
sys     31m37.756s

That’s 9.6% faster for bzImage and 6.4% faster for modules.

So for a performance boost that’s between 5% and 10% I get greater exposure to kernel security issues and more difficulty tracking CPU time. That doesn’t seem like a good trade-off so I’ve put the “nosmt” kernel command-line option back.

,

Linux AustraliaCouncil Meeting 1st March 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Sae Rae Germaine (Council)
  • Jonathan Woithe (Council)

Apologies 

  • Marcus Herstik (Council)

Not Present

Meeting opened at 20:02 AEDT by Joel Addison  and quorum was achieved.

Minutes taken by Neill Cox

 

2. Log of correspondence

  • 22 Feb Enquiry from Bryce Torcello re linux.conf.au – Wil has responded
  • 25 Feb Miles Goodhew re LCA2022 close out. Miles has delivered the LCA gear to Neill and has now completed his tasks for LCA2022. He wishes the EverythingOpen organisers well but won’t be able to attend the conference. 
  • 27 Feb Russell Keith-Magee re Update to PyCon AU 2023 budget. 
  • 27 Feb Michael Richardson re DrupalSouth Wellington – Budget and Onboarding
  • PO Box: 5x Westpac statements 3x WWCC for EverythingOpen Conference

 

3. Items for discussion

  • Revised PyconAU budget:
    Council has concerns about the lower capacity budgets having online ticket sales but no platform to provide streaming.
    Council will discuss this further with the PyConAU organisers.  
  • DrupalSouth doing their conference in Wellington triggered several support queries from me to Stripe as all our banking links to anz.co.nz had been deleted, and I couldn’t figure out how to recreate them.  Stripe’s response said it’s no longer possible for LA to deposit funds into a NZ bank.  Depositing NZ$ card payments into an Australia bank seems prohibitively expensive, so Stripe is no longer a viable way of handling credit card payments in NZ$.  No immediate action required now as DrupalSouth is accepting payments via Lil’ Regie, but that won’t be an option for Symposium should we hold Pycon / EO / LCA in NZ.  Other options appear to be: use a different payment processor, or only accept AUD$ payments for NZ conferences.
    Finding a suitable payment processor may be difficult. PayPal have anti money laundering triggers that may not work for conferences that are quiet most of the year and then suddenly busy. Local (Australian) payment options can be expensive. Square seems to have stringent authentication requirements which can bite organisations which have no fixed address. Investigation will be required.
  • DrupalSouth have completed their financial induction, and have given us a budget, list team members, and agreed to abide by LA Policies.  The next steps are:
    • Consider their proposal (conference web site), and in particular their budget.
    • Pass a motion accepting them as a subcommittee.

Motion by Russell Stuart: That we accept the Drupal South budget and approve them as an LA subcommittee.
Seconded: Sae Ra Germaine

Passed unanimously

4. Items for noting

  • DrupalSouth financial induction done.  We are now waiting on them to provide event and team details, as described in last of induction.
  • EverythingOpen will run a “friends of LA” session (similar to ghosts sessions for LCA). There is a budget item for ghosts that will cover this option.
  • EverythingOpen Budget
    Sponsorship is going well, but at this stage ticket sales are lower than expected. At this stage a loss is likely.
  • EverythingOpen Charity fund matching
    Motion by Sae Ra Germaine: That LA matches up to $5,000 fundraising for Everybody’s Home (EverythingOpen’s chosen charity)

Seconded: Wil Brown

Passed: Unanimously

5. Other business

  • Admin team – Steve Walsh

Election held and as far as Steve knows it was fine, waiting for information from Julien re the possibility of identifying votes in the election.

Renewed opensource.org.au as a hold on opensource.au. Not delegated to anywhere, would perhaps be a good idea to sort that out so that it actually serves content. The other contender is opensource.net.au which is now held by a Fairfax company. 

Steve will fix the secretary@linux.org.au address to forward to Neill (neill.cox.la@gmail.com). There are issues with spam filtering on the office bearer email addresses. We have to be very careful of false positives.

The post Council Meeting 1st March 2023 – Minutes appeared first on Linux Australia.

,

Russell CokerLinks February 2023

Vox has an insightful interview with the author of “Slouching Towards Utopia: An Economic History of the Twentieth Century” [1]. The main claim of that book is that “The 140 years from 1870 to 2010 of the long twentieth century were, I strongly believe, the most consequential years of all humanity’s centuries”. A claim that seems well supported.

PostMarketOS is an interesting OS for hardware designed for Android [2]. It is based on Alpine Linux, is small, and modular. If you want to change something just change that package not the entire image. Also an aim is to have as much commonality between devices as possible, all phones with the same CPU family can run the same packages apart from the kernel and maybe some utilities related to hardware. Abhijithpa blogged about getting started with pmOS, it seems easy to do [3].

Interesting article about gay samurai [4]. Regarding sex with men or women “an elderly arbiter, after hearing the impassioned arguments of the two sides, counsels that the wisest course is to follow both paths in moderation, thereby helping to prevent overindulgence in either”. Wow.

The SCP project is an interesting collaborative SciFi/horror fiction project [5] based on an organisation that aims to Secure and Contain dangerous objects and beings and Protect the world from them. The series of stories about the Anti-Memetics Division [6] is a good place to start reading.

,

Robert CollinsRustup CI / test suite performance

Rustup (the community package manage for the Rust language) was starting to really suffer : CI times were up at ~ one hour.

We’ve made some strides in bringing this down.

Caching factory for test scenarios

The first thing, which achieved about a 30% reduction in test time was to stop recreating all the test context every time.

Rustup tests the download/installation/upgrade of distributions of Rust. To avoid downloading gigabytes in the test suite, the suite creates mocks of the published Rust artifacts. These mocks are GPG signed and compressed with multiple compression methods, both of which are quite heavyweight operations to perform – and not actually the interesting code under test to execute.

Previously, every test was entirely hermetic, and usually the server state was also unmodified.

There were two cases where the state was modified. One, a small number of tests testing error conditions such as GPG signature failures. And two, quite a number of tests that were testing temporal behaviour: for instance, install nightly at time A, then with a newer server state, perform a rustup update and check a new version is downloaded and installed.

We’re partway through this migration, but compare these two tests:

fn check_updates_some() {
    check_update_setup(&|config| {
        set_current_dist_date(config, "2015-01-01");
        config.expect_ok(&["rustup", "update", "stable"]);
        config.expect_ok(&["rustup", "update", "beta"]);
        config.expect_ok(&["rustup", "update", "nightly"]);
        set_current_dist_date(config, "2015-01-02");
        config.expect_stdout_ok(
            &["rustup", "check"],
            for_host!(
                r"stable-{0} - Update available : 1.0.0 (hash-stable-1.0.0) -> 1.1.0 (hash-stable-1.1.0)
beta-{0} - Update available : 1.1.0 (hash-beta-1.1.0) -> 1.2.0 (hash-beta-1.2.0)
nightly-{0} - Update available : 1.2.0 (hash-nightly-1) -> 1.3.0 (hash-nightly-2)
"
            ),
        );
    })
}
fn check_updates_some() {
    test(&|config| {
        config.with_scenario(Scenario::ArchivesV2_2015_01_01, &|config| {
            config.expect_ok(&["rustup", "toolchain", "add", "stable", "beta", "nightly"]);
        });
        config.with_scenario(Scenario::SimpleV2, &|config| {
        config.expect_stdout_ok(
            &["rustup", "check"],
            for_host!(
                r"stable-{0} - Update available : 1.0.0 (hash-stable-1.0.0) -> 1.1.0 (hash-stable-1.1.0)
beta-{0} - Update available : 1.1.0 (hash-beta-1.1.0) -> 1.2.0 (hash-beta-1.2.0)
nightly-{0} - Update available : 1.2.0 (hash-nightly-1) -> 1.3.0 (hash-nightly-2)
"
            ),
        );
            })
    })
}

The former version mutates the date with set_current_dist_date; the new version uses two scenarios, one for the earlier time, and one for the later time. This permits the server state to be constructed only once. On a per-test basis it can move as much as 50% of the time out of the test.

Single binary for the integration test suite

The next major gain was moving from having 14 separate integration test binaries to just one. This reduces the link cost of linking the test binaries, all of which link in the same library. It also permits us to see unused functions in our test support library, which helps with cleaning up cruft rather than having it accumulate.

Hard linking rather than copying ‘rustup-init’

Part of the test suite for each test is setting up an installed rustup environment. Why not start from scratch every time? Well, we obviously have tests that do that, but most tests are focused on steps beyond the new-user case. Setting up an installed rustup environment has a few steps, but particular ones are copying a binary of rustup into the test sandbox, and hard linking it under various names: cargo, rustc, rustup etc.

A debug build of rustup is ~20MB. Running 400 tests means about 8GB of IO; on some platforms most of that IO won’t hit disk, on others it will.

In review now is a PR that changes the initial copy to a hardlink: we hardlink the rustup-init built by cargo into each test, and then hardlink that to the various binaries. That saves 8GB of IO, which isn’t much from some perspectives, but it adds pressure on the page cache, and is wasted work. One wrinkle is a very low max-links limit on NTFS of 1023; to mitigate that we count the links made to rustup-init and generate a new inode for the original to avoid failures happening.

Future work

In GitHub actions this lowers our test time to 19m for Linux, 24m for Windows, which is a lot better but not great.

I plan on experimenting with separate actions for building release artifacts and doing CI tests – at the moment we have the same action do both, but they don’t share artifacts in the cache in any meaningful way, so we can probably gain parallelism there, as well as turning off release builds entirely for CI.

We should finish the cached test context work and use it everywhere.

Also we’re looking at having less integration tests and more narrow close to the code tests.

Michael StillCisco CyberOps Associate: Official Cert Guide

I don’t think I’ve really reviewed a technical book here before, but I read the thing so I guess I should. This book is the certification guide for a “Cisco CyberOps Associate” certification, which is what they now call the CCNA Security qualification. Its a relatively junior certification, qualifying you to be a level one operator in a Security Operations Centre (SOC).

I read this book because I took a Cisco NetAcad course for the associated certification in the second half of 2022 (although it has continued to be a thing I plug away at in 2023). That was mainly motivated by a desire to more about a field that is clearly important, but hasn’t been core to my personal career.

This book is reasonably well written and readable — I’d read a chapter in the evening after work and its wasn’t a huge chore to churn though. I certainly learned things along the way, even if the certification seems to suffer from a desire to have everyone rote learn a lot of acronyms, which seems like a common ailment in the industry (AWS Certified Cloud Practitioner, I’m looking at you).

My main critism is of the qualification itself, which is that it is quite Cisco centric — almost all examples of the implementation of a technology are a Cisco product, which is great if you’re trying to demonstrate the depth of Cisco’s portfolio, but isn’t great if you’re competing with less vendor centric certification options. This is in contrast to the CCNA content, which feels more vendor neutral to me because its more fundamental.

That said, this book wasn’t a waste of my time and I learned stuff — which I guess is mission accomplished for a technical book?

Cisco Cyberops Associate Cbrops 200-201 Official Cert Guide Book Cover Cisco Cyberops Associate Cbrops 200-201 Official Cert Guide
Omar Santos
Computers
Cisco Press
August 6, 2020
900

Modern organizations rely on Security Operations Center (SOC) teams to vigilantly watch security systems, rapidly detect breaches, and respond quickly and effectively. To succeed, SOCs desperately need more qualified cybersecurity professionals. Cisco's new Cisco Certified CyberOps Associate certification prepares candidates to begin a career working as associate-level cybersecurity analysts within SOCs.

,

Lev Lafayette2022 HPC Training Utilisation and Results

Unique identifiers for 263 users who received HPC training in 2022 was determined from collected attendee records. Note that users may enrol in multiple courses (e.g., Introduction to Spartan, Advanced Spartan, Parallel Processing, etc) and may return for revision. All these users are counted once only.

From the unique users a total of 212 usernames could be determined from email addresses. When enrolling for training users do not include their Spartan usernmae or their university ID; sometimes they don't even use a university email address, despite requests.

There were 97 users who established an account but did not use Spartan (compute hours = 0). Of the remaining 115 users the total of job hours was determined from trained users was 6280454, after they received training. This calculation ensured that users who had already run jobs on Spartan prior to receiving training was not counted. e.g.,

$ sreport cluster AccountUtilizationByUser cluster=spartan user=$username start=2022-11-01 end=2022-12-31 -t hours

The total allocated hours of cluster utilisation 11597951, from the command:

$ sreport cluster Utilization cluster=spartan start=2022-11-01 end=2022-12-31 -t hours

The means that at least 54.14% of cluster utilisation in 2022 was conducted by users after receiving training.

The following steps are recommended to improve record-keeping and utilisation.

1) Emphasising the need to enrollees to use University of Melbourne email addresses only, and rejecting applications that do not do this.

2) Contacting those who attended training but did not use Spartan to ascertain why this was the case.

,

Russell CokerNew 18 Core CPU and NVMe

I just got a E5-2696 v3 CPU for my ML110 Gen9 home workstation, this has a Passmark score of 23326 which is almost 3 times faster than the E5-2620 v4 which rated 9224. Previously it took over 40 minutes real time to compile a 6.10 kernel that was based on the Debian kernel configuration, now it takes 14 minutes of real time, 202 minutes of user time, and 37 minutes of system CPU time. That’s a definite benefit of having a faster CPU, I don’t often compile kernels but when I do I don’t want to wait 40+ minutes for a result. I also expanded the system from 96G of RAM to 128G, most of the time I don’t need so much RAM but it’s better to have too much than too little, particularly as my friend got me a good deal on RAM. The extra RAM might have helped improve performance too, going from 6/8 DIMM slots full to 8/8 might help the CPU balance access.

That series of HP machines has a plastic mounting bracket for the CPU, see this video about the HP Proliant Smart Socket for details [1]. I was working on this with a friend who has the same model of HP server as I do, after buying myself a system I was so happy with it that I bought another the same when I saw it going for a good price and then sold it to my friend when I realised that I had too many tower servers at home. It turns out that getting the same model of computer as a friend is a really good strategy so then you can work together to solve problems with it. My friend’s first idea was to try and buy new clips for the new CPUs (which would have delayed things and cost more money), but Reddit and some blog posts suggested that you can just skip the smart-socket guide clip and when the chip was resting in the socket it felt secure as the protrusions on the sides of the socket fit firmly enough into the notches in the CPU to prevent it moving far enough to short a connection. Testing on 2 systems showed that you don’t need the clip. As an aside it would be nice if Intel made every CPU that fits a particular socket have the same physical dimensions so clips and heatsinks can work well on all CPUs.

The TDP of the new CPU is 145W and the old one was 85W. One would hope that in a server class system that wouldn’t make a lot of difference but unfortunately the difference was significant. Previously I could have the system running 7/8 cores with BOINC 24*7 and I wouldn’t notice the fans being louder. It is possible that 100% CPU use on a hot day might make the fans sound louder if I didn’t have an air-conditioner on that was loud enough to drown them out, but the noteworthy fact is that with the previous CPU the system fans were a minor annoyance. Now if I have 16 cores running BOINC it’s quite loud, the sort of noise that makes most people avoid using tower servers as workstations! I’ve found that if I limit it to 4 or 5 cores then the system is about as quiet as it was before. As a rough approximation I can use as much CPU power as before without making the fans louder but if I use more CPU power than was previously available it gets noisy.

I also got some new NVMe devices, I was previously using 2*Crucial 1TB P1 NVMes in a BTRFS RAID-1 and now I have 2*Crucial 1TB P3 NVMes (where P1 is the slowest Crucial offering, P3 is better and more expensive, P5 is even better, etc). When doing the BTRFS migrations to move my workstation to new NVMe devices and my server to the old NVMe devices I found that the P3 series seem to have a limit of about 70MB/s for sustained random writes and the P1 series is about 35MB/s. Apparently with the cheaper NVMe devices they slow down if you do lots of random writes, pity that all the review articles talking about GB/s speeds don’t mention this. To see how bad reviews are Google some reviews of these SSDs, you will find a couple of comment threads on places like Reddit of them slowing down with lots of writes and lots of review articles on well known sites that don’t mention it. Generally I’d recommend not upgrading from P1 to P3 NVMe devices, the benefit isn’t enough to cover the effort. For every capacity of NVMe devices the most expensive devices cost more than twice as much as the cheapest devices, and sometimes it will be worth the money. Getting the most expensive device won’t guarantee great performance but getting cheap devices will guarantee that it’s slow.

It seems that CPU development isn’t progressing as well as it used to, the CPU I just bought was released in 2015 and scored 23,343 according to Passmark [2]. The most expensive Intel CPU on offer at my local computer store is the i9-13900K which was released this year and scores 62,914 [3]. One might say that CPUs designed for servers are different from ones designed for desktop PCs, but the i9 in question has a “TDP Up” of 253W which is too big for the PSU I have! According to the HP web site the new ML110 Gen10 servers aren’t sold with a CPU as fast as the E5-2696 v3! In the period from 1988 to about 2015 every year there were new CPUs with new capabilities that were worth an upgrade. Now for the last 8 years or so there hasn’t been much improvement at all. Buy a new PC for better USB ports or something not for a faster CPU!

,

Michael StillExploring more efficient remote large file storage

My primary personal project is a thing called Shaken Fist these days — it is an infrastructure as a service cloud akin to OpenStack Compute, but smaller and simpler. Shaken Fist doesn’t have an equivalent to the OpenStack Image service, instead letting your describe your instance images by a standard URL. One of the things Shaken Fist does to be easier to use is it maintains an official repository of common images, which allows users to refer to those images with a shorthand syntax instead of a complete URL. The images also contain small customizations (mainly including the Shaken Fist in-guest agent), which means I can’t just use the official upstream cloud images like OpenStack does.

The images were stored at DreamHost until this week, when a robot decided that they looked like offline backups, despite being served to the Internet via HTTP and being used regularly (although admittedly not frequently). DreamHost unilaterally decided to delete the web site, so now I am looking for new image hosting services, and thinking about better ways to build an image store.

(Oh, and recommending to anyone who asks that they consider using someone less capricious than DreamHost for their hosting needs).

All of this got me thinking. What would be the requirements for a next-generation image store for Shaken Fist? Given Shaken Fist is a personal project with not a lot of delivery pressure, I’ve been trying for the last year or two to “take the tangent” when one appears. In general I try to develop those ancillary things as separate sub-projects, in the hope that they’ll be useful to other people one day. Examples of tangents include Occystrap (OCI image support in python), and Clingwrap (a tool to build “field service dumps” of machine state to aid in debugging).

What would a better image store look like if I took that tangent? Here’s what I came up with:

  • Efficient storage of images where the changes between the images are small: I do daily builds for the images, and theorize that therefore the images should be fairly similar day to day. It would be nice to only store the delta somehow when adding a new image.
  • Improved cachability for clients: if they possess an older image which overlaps with a newer one, they should only have to download the delta.
  • Support for cloud native object stores: there are a variety of inexpensive cloud object stores these days, and it would be nice to harness one of those for storage. The big limitation here is that in general these object stores do now allow for directory listing, so they’re good for key / value lookups where you already know the key, but not good for iterating all keys.

A specific non-goal is being fast to encode into the new format — that only happens once on a single build machine, and I am willing to accept some additional computation in return for reduced storage costs and faster downloads for my users.

My initial naive implementation uses the following algorithm:

  • Split the file into chunks and calculate a sha512 checksum of that chunk.
  • If the remote store already has a chunk with that checksum, do not upload.
  • If it does not, then compress the chunk using gzip and upload it.
  • Add the sha512 checksum to the list of chunks the file is composed of.
  • After the entire file has been processed, emit a file containing metadata about the file, including the list of chunks.

My thinking is that the metadata file is the one you’d refer to when downloading an image, and the chunks would then be fetched as a second stage. I note that this approach has similarities to how Docker layers are stored in container image repositories.

There are some obvious experiments possible here. For example, what is the optimal chunk size? I should note that the chunk size can’t be too small, because each unique chunk becomes an object, and for some object stores (such as Linux filesystems), there are limits on the number of objects we can reasonably store.

I therefore took my 113gb collection of CentOS 8 images, and tried a variety of chunk sizes. I know I said above that I don’t particularly care about processing time, but I do think its good to keep track of. Therefore, I also measured the “no-write-IO” time to process the repository — that is I encoded the respository and then encoded it again to the same destination. This gives me a measure of processing time while not including any writes to the destination which hopefully minimises the noise from a busy spinning rust disk array as much as possible. Here are the numbers:

Chunk size (mb)ChunksRepository size (gb)Processing time (minutes, no writes)Respository size (%age of original size)
196,699911781%
250,824961385%
334,780981187%
426,5031001688%
521,4841011189%

Or graphically:

A graph showing preformance of BlockStash on compressed images with varying chunk sizes.
BlockStash performance on compressed images

But wait! The default in my previous image store used compressed qcow2 images — that is they have been compressed with the DEFLATE algorithm. These images are still sparse regardless of the compression used, so we are unlikely to see large blocks of zeros in the data.

The compression in the source images means we’re trying to chunk up files with a streaming compression algorithm that likes to use references to previous data seen earlier in the file. It therefore seems likely that performance will change with uncompressed input images. Here’s the same experiment, but with 241gb of uncompressed images passed (the same images as before, just decompressed):

Chunk size (mb)ChunksRepository size (gb)Processing time (minutes, no writes)Respository size (%age of original size)
1175205724264%
299458823473%
370520873177%
454494902980%
544835922881%

Or graphically:

BlockStash performance on uncompressed images

As an aside, there’s a notable flaw in this first chunking approach, because a change early in a file which offsets the chunks will result in each chunk being new. I haven’t thought of a computationally reasonable fix for that, so such is life for now. In theory, it could be fixed in this proposed format by inserting a small “shim” chunk which has that early change, but computing all possible hashes for all possible sliding blocks sounds super expensive to me. Rabin-Karp rolling hashes look promising for helping with this issue, but I haven’t persued it further because my target data is disk images which are block editted, not insertion editted.

So what option should I pick? My reading of the data is that there are sigificant server side storage wins (and an associated reduction in the need for client downloads) if I go for a smaller chunk size starting with uncompressed data, but I have concerns about the number of chunks a file download might incur — is downloading 1,024 1mb chunks actually faster than downloading a single 1gb file? I think that’s perhaps a matter for another post, as I’ll need to do some more benchmarking there.

,

Linux AustraliaCouncil Meeting 15th February 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine  (Council)
  • Marcus Herstik (Council)
  • Jonathan Woithe (Council)

 

Apologies 

  • None

 

Not Present

  • None

 

Meeting opened at 20:08 AEDT by Joel Addison  and quorum was achieved.

Minutes taken by Neill Cox

 

2. Log of correspondence

  • Email from Dwight Walker re CiviCRM install notes. Joel and Steve Walsh have already responded. Further emails from Dwight received Feb 13. Steve has been in Fiji for work for the last two weeks. Dwight is attempting to setup a development environment rather than waiting any longer.
  • Email from Richard Jones “Kicking off the PyCon AU 2023 financials”. Joel has responded. 
  • Two emails from YouTube re “New YouTube Partner Program terms coming this week”. Joel has actioned(TBC)
  • Email from Russell Stuart re “NSW Fair trading annual return”. Russel has actioned this. Money has been paid.
  • Email from Zoom re change to account billing information (Russell). New credit card has been recorded on the account.
  • Email thread re Linux Australia Activity Statement Oct..Dec 2022 (Russell and Sonny Bonang (Activity Statement lodged and accountant’s invoice paid)
  • Email from Kathy Reid re Nat Torkington’s son William who sadly took his own life. Wil has responded.
  • Email from Miles Goodhew re “Moving the LCA stuff to Brisbane”. Miles has a quote for $1,145 (from Dawson Removals), insurance would be $35/$1,000 of value. Miles has two other quotes from Kent Moving ($745 plus insurance) and Grace Removals for $1,228.57. Miles would like his shed back.
  • Email from Kathy Reid re “Seeking to allocate $AUD 1000 from budgeted LA funds to Media & Communications Subctte for rewards and recognition”
  • PO Box payment due soon
  • Richard Jones has requested an update on PyConAU 2023. Joel has responded, basically saying we will reply as soon as possible after the first meeting of the new council.
  • Betsy Waliszewski  re OSI’s 25th Anniversary and Everything Open
  • Steve Walsh re APNIC Membership Renewal. Renewal invoice has been paid.
  • Russell Stuart, Michael Richardson & Owen Lansbury re Xero and Westpac Setup for Drupal South
  • PO Box relocated to Kent Street
  • Westpac and ANZ bank statements
  • Volunteer WWCC notice received
  • secretary@ emails still going to Clinton

3. Items for discussion

  • Shipping LCA stuff
    • Much of the LCA gear is now probably of limited value. Marcus has offered to transport from Canberra if it will fit.
    • We have pictures but they lack scale.
    • Steve Walsh may be able to ship things via AARNet.
    • Joel will look through the list of gear from Miles to decide what to keep and what we can get rid of.
    • Russel suggests that we make it Everything Open’s problem.
    • Sae Ra points out that there is nowhere to put it in Melbourne.
    • Action Item: Joel to go through the list to see what can be culled. Then we’ll decide how to move the remainder..
  • Media and Communications Subcommittee funding request
    • We currently allow the admin team to manage their own budget, as do the various event subcommittees for events.
    • $2,000 is already budgeted for Rewards and Recognition and $1,000 for Advertising and Marketing, so effectively this is us giving permission to Kathy to spend money that she has already been authorised to use.
    • Motion by Joel: That we allocate $1,000 to the media subcommittee from the media budget. Carried unanimously
  • PO Box Payment due soon
    • Email was sent to Neill who has now forwarded it to Council and Treasurer for payment.
    • Joel will ask Steve to update the secretary@ address to go to Neill
  • OSI 25th Anniversary
    • OSI would like to celebrate this at EO2023. Possibly cupcakes/afternoon tea. This will be handled by the EO2023 organising committee.
  • APNIC Renewal
    • Has been paid.
  • Drupal South
    • Drupal South will be held in New Zealand this year, dates to be confirmed.  Proposal is yet to be formally accepted by the LA council. Most recent correspondence was that they were still finalising their budget. Looks to be very similar to last year’s conference, including the use of an external conference organiser.
  • Conflict of interest register.
    • Joel will share a link and we should each fill out as appropriate
  • William Torkington
    • Wil to add an in memorandum page to the website and include William. Sae Ra to send a list of other people to be included.
  • Working With Children
    • These are needed for organisers of EO2023, but also probably LA council members.
    • Takes some time to process. Registered mail seems to slow the process down. Unfortunately
    • sending 100+ points of ID through the mail unregistered is not something to make people happy.
  • PyCon AU 2023 – Budget 2023 – ACC
    • PyConAU were able to use a deposit paid for the venue when it was originally planned to be held in Adelaide pre-covid.
    • The attendance numbers do not seem to reflect the current economic climate. However, given that the choice is between losing $1,000 at the bare bones level or $100,000 if we forfeit the deposit the best option is to go ahead with the conference.
    • Joel will respond to Richard giving approval for the conference to be announced, but asking that the budget be revisited with expectations of fewer attendees.

 

4. Items for noting

  • Russell is holding a financial induction for Drupal South.  Drupal can make it on  UTC 20:00..23:00 on Feb 20, 21, or 22.  Do any committee members want to join, and if so lets settle on a time.
    • Russell will finalise a time and send zoom invites.

5. Other business

  • None

The post Council Meeting 15th February 2023 – Minutes appeared first on Linux Australia.

,

Michael StillThis is going to hurt

This book is lots of things: honest, funny, and ultimately heart breaking. I don’t remember how I came across it, but its a good read for when travelling as the diary format means you can put it down whenever you need to do something else.

I’m left wondering how the Australian medical system compares to the NHS — I know we have more patient choice and flexibility — but I wonder what its like for those working within the system.

Either way I definitely recommend this book.

This is Going to Hurt Book Cover This is Going to Hurt
Adam Kay
Biography & Autobiography
Pan Macmillan
2018
279

As soon as Adam Kay set foot on a hospital ward for the first time, he realised there's quite a lot they don't teach you at medical school ... His diaries from the NHS front line - scribbled in secret after long nights, endless days and missed weekends - are hilarious, horrifying and heartbreaking by turns. This Is Going to Hurt is everything you wanted to know about being a junior doctor, and more than a few things you really didn't. And yes, it may leave a scar.

,

Russell CokerIntel vs AMD

In response to a post about my latest laptop I had someone ask why I chose an Intel CPU. I’ve been a fan of the Thinkpad series of laptops since the 90s. They have always seemed well constructed (given the constraints of being light etc) and had a good feature set. Also I really like the TrackPoint. I’ve been a fan of the smaller Thinkpads since I got an X-301 from e-waste [1] and the X1-Carbon series is the latest and greatest line of small Thinkpads.

AMD makes some nice laptop CPUs which appear to have low power use and good performance particularly for smaller numbers of threads, it seems that generally AMD CPUs are designed for fewer cores with higher performance per core which is good for laptops. But Lenovo only makes the Thinkpad Carbon X1 series with Intel CPUs so choosing that model of laptop means choosing Intel. It could be that for some combination of size, TDP, speed, etc Intel just happens to beat AMD for all the times when Lenovo was designing a new motherboard for the Carbon X1. But it seems more likely that Intel has been lobbying Lenovo for this. It would be nice if there was an anti-trust investigation into Intel, everyone who’s involved in the computer industry knows of some of the anti-competitive things that they have done.

Also it would be nice if Lenovo started shipping laptops with ARM CPUs across their entire range. But for the moment I guess I have to keep buying laptops with Intel CPUs.

,

Simon LyallAudioBooks – January 2023

Colditz Prisoners of the Castle by Ben Macintyre

A good contrast to the “Boys Own” versions by Pat Reid I read as a kid. Covers lots of other viewpoints including from the Germans. Recommended 4/5

Project Hail Mary by Andy Weir

Last read July 2021. A semi-repeat of The Martin where a lone astronaut has to science the shit out of a bad situation. This time to save humanity. 4/5

Seven Games: A Human History by Oliver Roeder

Working through increased complexity of Checkers, backgammon, chess, and Go. Poker, Scrabble, and bridge the author looks at how humans and computers play them. 4/5

Daily Rituals: How Artists Work by Mason Curry

161 short articles about the work habits of authors, artists, composers and the like. Interesting with some ideas one can potentially adopt. 3/5

The Extraordinary Life of an Ordinary Man: A Memoir by Paul Newman

Based on tapes recordings made by the actor and those that knew him. Honest and Deep rather than broad and concentrating on his early life and career. 4/5

A Wizard of Earthsea by Ursula K. Le Guin

Classic Children’s Fantasy story that I haven’t read since I was a kid. Told in a very epic tone and language. Good although I missed the map on audiobook. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Tim SerongHack Week 22: An Art Project

Back in 2012, I received a box of eight hundred openSUSE 12.1 promo DVDs, which I then set out to distribute to local Linux users’ groups, tech conferences, other SUSE crew in Australia, and so forth. I didn’t manage to shift all 800 DVDs at the time, and I recently rediscovered the remaining three hundred and eighty four while installing some new shelves. As openSUSE 12.1 went end of life in May 2013, it seemed likely the DVDs were now useless, but I couldn’t bring myself to toss them in landfill. Instead, given last week was Hack Week, I decided to use them for an art project. Here’s the end result:

Geeko mosaic made of cut up openSUSE DVDs, on a 900mm x 600mm piece of plywood

Making that mosaic was extremely fiddly. It’s possibly the most annoying Hack Week project I’ve ever done, but I’m very happy with the outcome 🙂

The backing is a piece of 900mm x 600mm x 6mm plywood, primed with some leftover kitchen and bathroom undercoat, then spray pained black. I’d forgotten how bad spray paint smells, but it makes for a nice finish. To get the Geeko shape, I took the official openSUSE logo, then turned it into an outline in Inkscape, saved that as a PNG, opened it in GIMP, and cut it into nine 300mm x 200mm pieces which I then printed on A4 paper, stuck together with tape, and cut out to make a stencil. Of course, the first time I did that, nothing quite lined up, so I had to reprint it but with “Ignore page margins” turned off and “Draw crop marks” turned on, then cut the pages down along the crop marks before sticking them together the second time. Then I placed the stencil on the backing, glued the eye down (that just had to be made from the centre of a DVD!) and started laying out cut up DVD shards.

Geeko mosaic work in progress

I initially tried cutting the DVDs with tin snips, which is easy on the hands, but had a tendency to sometimes warp the DVD pieces and/or cause them to delaminate, so I reverted to a large pair of scissors which was more effort but ultimately less problematic.

After placing the pieces that made up the head, tail, feet and spine, and deciding I was happy with how they looked, I glued each piece down with superglue. Think: carefully pick up DVD shard without moving too many other shards, turn over, dab on a few tiny globs of superglue, lower into place, press for a few seconds, move to next piece. Do not get any superglue on your fingers, or you’ll risk sticking your fingers together and/or make a gluey mess on the shiny visible side of the DVD shards.

It was another three sessions of layout-then-glue-down to fill in the body. I think I stuck my fingers together about six, or eight, or maybe twenty times. Also, despite my best efforts to get superglue absolutely nowhere near the stencil at all, when I removed the stencil, it had stuck to the backing in several places. I managed to scrape/cut that off with a combination of fingernails, tweezers, and the very sharp knife in my SLE 12 commemorative Leatherman tool, then touched up the remaining white bits with a fine point black Sharpie.

SLE 12 commemorative Leatherman tool (it seemed appropriate to use this)

Judging from the leftover DVD centre pieces, this mosaic used about 12 DVDs in all, which isn’t very many considering my initial stash. I had a few other ideas for the remainder, mostly involving hanging them up somehow, which I messed around with earlier on while waiting for the paint to dry on the plywood.

One (failed) idea was to use a cutting wheel on my Dremel tool to slice half way through a few DVDs, then slot them into each other to make a hanging thingy that would spin in the wind. I was unable to make a smooth/straight enough cut for this to work, and superglue doesn’t bridge gaps. You can maybe get an idea of what I was aiming at from this photo:

Four DVDs slotted into each other vertically, kinda, one with nasty superglue smear

My wife had an idea for a better way to do this, which is to take a piece of dowel, cut slots in the sides, and glue DVD halves into the slots using Araldite (that’s an epoxy resin, in case you didn’t grow up with that brand name). I didn’t get around to trying this, but I reckon she’s onto something. Next time I’m at the hardware store, I’ll try to remember to pick up some suitably sized dowel.

I did make one somewhat simpler hanging thingy, which I call “Geeko’s Tail (Uncurled)”. It’s just DVDs superglued together on the flat, hanging from fishing line, but I think it’s kinda cool:

No, it’s not an upside down question mark, it’s “Geeko’s Tail (Uncurled)”

Also, I’ve discovered that Officeworks has an e-waste recycling program, so any DVDs I don’t use in future projects needn’t go to landfill.

Update 2023-02-20: For photos of the mosaic, plus wallpapers made from the photos, see https://github.com/tserong/hackweek22

,

Lev LafayetteThe Importance of Supercomputing

Most people use their computers (which includes mobile phones) for communication, social media, games, entertainment, office applications, and the like. Most of the time these activities are not particularly onerous in terms of computing as such or do not lead to enormous benefits in productivity, inventions, and discovery. There is one field, however, rarely discussed, that does do this - and that is supercomputing. It is through supercomputing that we are witnessing the most important technological advances of our day, including astronomy, weather and climate forecasting, materials science and engineering, molecular modeling, genomics, neurology, geoscience, and finance - all with numerous success stories.

Usually, I draw a distinction between supercomputing and high-performance computing. Specifically, a supercomputer is any computer system that has exceptional computational power at a particular point in time, many (but not all) of which are measured in the bi-annual Top500 list. Once upon a time dominated by monolithic mainframes supercomputers, in a contemporary sense, are a subset of high-performance computing, which is typically arranged as a cluster of commodity-grade servers with a high-speed interconnect and message-passing software that allows the entire unit to be treated as a whole. One can even put together a "supercomputer" from Raspberry Pi systems, as the University of Southhampton illustrates.

How important is this? For many years now we've known that there is a strong association between research output and access to such systems. Macroeconomic analysis shows that for every dollar invested in supercomputing, there is a return of forty-four dollars in profits or cost-savings. Both these metrics are almost certainly going to increase in time; datasets and problem complexity are growing at a rate greater than the computational performance of personal systems. More researchers need access to supercomputers.

However, researchers do require training to use such systems. The environment, the interface, the use of schedulers on a shared system, the location of data, is all something that needs to be learned. This is a big part of my life; in the last week, I spent three days teaching researchers from the basic of using a supercomputer system to scripting jobs, to using Australia's most powerful system Gadi at NCI, along with contributions at a board meeting of the international HPC Certification Forum. It is often a challenging vocation, but I feel confident that it is making a real difference to our shared lives. For that, I am very grateful.

,

Francois MarierUpgrading from Ubuntu 20.04 focal to 22.04 jammy

A few weeks ago, I upgraded a few machines from Ubuntu 20.04 (focal) to 22.04 (jammy). Here are the things that needed fixing after the upgrade.

Network problems

Firstly, I had to fix the resolution of .local domains the same way as I did when I upgraded a different machine from 18.04 (bionic) to 20.04 (focal).

ssh agent problems

Then, I found that ssh-add no longer worked and instead returned this error:

Could not open connection to your authentication agent

While this appears to be a known issue, the work-around suggested in the i3 forum didn't work for me. What did work was the solution described in this blog post:

  1. Add this to my ~/.bash_profile:

    eval $(systemctl --user show-environment | grep SSH_AUTH_SOCK)
    export SSH_AUTH_SOCK
    
  2. Add this to my startup script:

    /usr/bin/systemctl --user start ssh-agent.service
    

I'm not sure why ED25519 keys don't work in gnome-keyring since that bug was supposedly fixed a while back, but starting gnome-keyring-ssh.service instead of ssh-agent.service didn't work for me.

Packges

When it comes to specific packages, I removed these two obsolete packages:

  • popularity-contest
  • unattended-upgrades

The former appears to no longer used by Ubuntu, while the latter is subsumed by built-in systemd functionality:

systemctl enable apt-daily-upgrade.timer
systemctl start apt-daily-upgrade.timer

I also installed these two new packages:

As always, I put any packages I backport from Debian unstable into my PPA. So far with jammy, I only had to update tiger to silence some bogus warnings.

,

Francois MarierUsing a Streamzap remote control with MythTV on Debian Bullseye

After upgrading my MythTV machine to Debian Bullseye and MythTV 31, my Streamzap remote control stopped working correctly: the up and down buttons were working, but the OK button wasn't.

Here's the complete solution that made it work with the built-in kernel support (i.e. without LIRC).

Button re-mapping

Since some of the buttons were working, but not others, I figured that the buttons were probably not mapped to the right keys.

Inspired by these old v4l-utils-based instructions, I made my own custom keymap by by copying the original keymap:

cp /lib/udev/rc_keymaps/streamzap.toml /etc/rc_keymaps/

and then modifying it to adapt it to what MythTV needs. This is what I ended up with:

[[protocols]]
name = "streamzap"
protocol = "rc-5-sz"
[protocols.scancodes]
0x28c0 = "KEY_0"
0x28c1 = "KEY_1"
0x28c2 = "KEY_2"
0x28c3 = "KEY_3"
0x28c4 = "KEY_4"
0x28c5 = "KEY_5"
0x28c6 = "KEY_6"
0x28c7 = "KEY_7"
0x28c8 = "KEY_8"
0x28c9 = "KEY_9"
0x28ca = "KEY_ESC"
0x28cb = "KEY_MUTE"
0x28cc = "KEY_UP"
0x28cd = "KEY_RIGHTBRACE"
0x28ce = "KEY_DOWN"
0x28cf = "KEY_LEFTBRACE"
0x28d0 = "KEY_UP"
0x28d1 = "KEY_LEFT"
0x28d2 = "KEY_ENTER"
0x28d3 = "KEY_RIGHT"
0x28d4 = "KEY_DOWN"
0x28d5 = "KEY_M"
0x28d6 = "KEY_ESC"
0x28d7 = "KEY_L"
0x28d8 = "KEY_P"
0x28d9 = "KEY_ESC"
0x28da = "KEY_BACK"
0x28db = "KEY_FORWARD"
0x28dc = "KEY_R"
0x28dd = "KEY_PAGEUP"
0x28de = "KEY_PAGEDOWN"
0x28e0 = "KEY_D"
0x28e1 = "KEY_I"
0x28e2 = "KEY_END"
0x28e3 = "KEY_A"

Note that the keycodes can be found in the kernel source code.

With my own keymap in place at /etc/rc_keymaps/streamzap.toml, I changed /etc/rc_maps.cfg to have the kernel driver automatically use it:

--- a/rc_maps.cfg
+++ b/rc_maps.cfg
@@ -126,7 +126,7 @@
 *      rc-real-audio-220-32-keys real_audio_220_32_keys.toml
 *      rc-reddo                 reddo.toml
 *      rc-snapstream-firefly    snapstream_firefly.toml
-*      rc-streamzap             streamzap.toml
+*      rc-streamzap             /etc/rc_keymaps/streamzap.toml
 *      rc-su3000                su3000.toml
 *      rc-tango                 tango.toml
 *      rc-tanix-tx3mini         tanix_tx3mini.toml

Button repeat delay

To adjust the delay before button presses are repeated, I followed these old out-of-date instructions on the MythTV wiki and put the following in /etc/udev/rules.d/streamzap.rules:

ACTION=="add", ATTRS{idVendor}=="0e9c", ATTRS{idProduct}=="0000", RUN+="/usr/bin/ir-keytable -s rc0 -D 1000 -P 250"

Note that the -d option has been replaced with -s in the latest version of ir-keytable.

To check that the Streamzap is indeed detected as rc0 on your system, use this command:

$ ir-keytable 
Found /sys/class/rc/rc0/ with:
    Name: Streamzap PC Remote Infrared Receiver (0e9c:0000)
    Driver: streamzap
    Default keymap: rc-streamzap
...

Make sure you don't pass the -c to ir-keytable or else it will clear the keymap set via /etc/rc_maps.cfg, removing all of the button mappings.

,

Colin CharlesLong Malaysians, Short Malaysia

I have long said “Long Malaysians, Short Malaysia” in conversation to many. Maybe it took me a while to tweet it, but this was the first example: Dec 29, 2021. I’ve tweeted it a lot more since.

Malaysia has a 10th Prime Minister, but in general, it is a very precarious partnership. Consider it, same shit, different day?

I just have to get off the Malaysian news diet. Malaysians elsewhere, are generally very successful. Malaysians suffering by their daily doldrums, well, they just need to wake up, see the light, and succeed.

In the end, as much as people paraphrase, ask not what the country can do for you, legitimately, this is your life, and you should be taking good care of yourself and your loved ones. You succeed, despite of. Politics and the state happens, regardless of.

Me, personally? Ideas are abound for how to get Malaysians who see the light, to succeed elsewhere. And if I read, and get angry at something (tweet rage?), I’m going to pop RM50 into an investment account, which should help me get off this poor habit. I’ll probably also just cut subscriptions to Malaysian news things… Less exposure, is actually better for you. I can’t believe that it has taken me this long to realise this.

Time to build.

The post Long Malaysians, Short Malaysia first appeared on Colin Charles Agenda.

Colin CharlesHello 2023

I did poorly blogging last year. Oops. I think to myself when I read, This Thing Still On?, I really have to do better in 2023. Maybe the catalyst is the fact that Twitter is becoming a shit show. I doubt people will leave the platform in droves, per se, but I think we are coming back to the need for decentralised blogs again.

I have 477 days to becoming 40. I ditched the Hobonich Techo sometime in 2022, and just focused on the Field Notes, and this year, I’ve got a Monocle x Leuchtturm1917 + Field Notes combo (though it seems my subscription lapsed Winter 2022, I should really burn down the existing collection, and resubscribe).

2022 was pretty amazing. Lots of work. Lots of fun. 256 days on the road (what a number), 339,551km travelled, 49 cities, 20 countries.

The getting back into doing, and not being afraid of experimenting in public is what 2023 is all about. The Year of The Rabbit is upon us tomorrow, hence why I don’t mind a little later Hello 2023 :)

Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.

I better remember that.

The post Hello 2023 first appeared on Colin Charles Agenda.

,

Linux AustraliaCouncil Meeting 18th January 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Clinton Roy (Secretary)
  • Jonathan Woithe (Council)
  • Neil Cox (Council)
  • Russell Stuart (Treasurer) (Arrived at twenty past eight)

Apologies 

Not Present

  • Lilac Kapul (Council)

 

Meeting opened at 20:04 AEDT by Joel  and quorum was achieved.

Minutes taken by Clinton Roy

 

2. Log of correspondence

  • 09/01/2023 Email from Dave Sparks Re: DrupalSouth 2023. They are keen to organise their next conference, and we have given them the standard starting points of budget requirements and account holders. A short response has been given, a longer response will be given post AGM.
  • Membership details requests.
  • AGM voting notification information request. Pointed to archives.
  • Web page links, some of our page links are out of date. Sent via the web form. ACTION ITEM Clinton will have a go at updating the links, and respond.

 

3. Items for discussion

  • AGM setup – zoom code mail? Any other tasks required.
  • We have a number of registrations, we need to cross reference them with our membership db, once we’ve done that, then the zoom codes can go out.  ACTION ITEM Wil and Joel to check registration numbers.
  • Two motions were moved on the list, ACTION ITEM Clinton to add them to the agenda
  • ACTION ITEM Joel to add the two motions as a poll in the zoom meeting
  • We have the returning officer organised, Julien Goodwin.

4. Items for noting

  • Election results are up.

5. Other business

  • ACTION ITEM Joel to follow up with the admin team about the SPF question from Russell Coker.

The post Council Meeting 18th January 2023 – Minutes appeared first on Linux Australia.

,

Lev LafayetteInstalling VASP 6.x on x86_64 RHEL 7.9 Linux

In the past I have posted two sets of instructions for installing VASP (Vienna Ab-initio Simulation Package for quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set), each for VASP 5.X on an Opteron system. Now, many years later, I find myself in the position of having to install VASP once again.

The installation approach is still pretty horrible but it has improved a great deal. Previously there was a small mountain of makefiles for different architectures and one had to find a file that was "close enough" and modify as required. This process is still required, but the quantity of makefiles is dramatically reduced with improved abstraction, directory management, and a test suite.

The structure (once extracted) is as follows:

                   vasp.X.X.X (root directory)
                                |
          ------------------------------------------------
         |        |        |         |          |         |
        arch     bin     build      src     testsuite   tools

* `root/`. Holds the high-level makefile and several subdirectories.
* `root/src`. Holds the source files of VASP and a low-level makefile.
* `root/arch`. Holds a collection of `makefile.include.*` files.
* `root/build`. The different versions of VASP, i.e., the standard, gamma-only, non-collinear, and CUDA-GPU versions will be build in separate subdirectories of this directory.
* `root/bin`. Here make will store the binaries.
* `root/testsuite`. Holds a suite of correctness tests to check your build.
* `root/tools`. Holds several python scripts related to the (optional) use of HDF5 input/output files.

Installation involves copying one the makefile.include.xxx files to the root directory as makefile.include, modifying it running make. The most straightforward, used in this example, is makefile.include.linux_gnu.

Within the makefile one will have to enter the values of the Fortran library directory, the LIBDIR for Blas, LAPACK, SCALAPACK, and FFTW. c.f.,

# LIBDIR     = /opt/gfortran/libs/
LIBDIR     = /usr/local/easybuild-2019/easybuild/software/core/gcccore/10.2.0/lib64/lib
BLAS       = -L$(LIBDIR) -lrefblas
LAPACK     = -L$(LIBDIR) -ltmglib -llapack
BLACS      = 
SCALAPACK  = -L$(LIBDIR) -lscalapack $(BLACS)

LLIBS      = $(SCALAPACK) $(LAPACK) $(BLAS)

In this particular installation, there is the use of the EasyBuild foss/2020b toolchain, which consists of GCC/10.2.0 and OpenMPI 4.0.5. Once that toolchain is loaded then one can also load FFTW/3.3.8, scalapack/2.1.0, openblas/0.3.12. Note that loading the modules will not be read by the VASP makefile. They still have to be hard-coded - it is convenient however when checking the PATH to the libraries.

The above code snippet from the makefile is a little deceptive. Something like the following is recommended instead:

# LIBDIR     = /opt/gfortran/libs/
LIBDIR     = /usr/local/easybuild-2019/easybuild/software/core/gcccore/10.2.0/lib64/lib
# BLAS       = -L$(LIBDIR) -lrefblas
# LAPACK     = -L$(LIBDIR) -ltmglib -llapack
BLACS      = 
# SCALAPACK  = -L$(LIBDIR) -lscalapack $(BLACS)

OPENBLAS_ROOT ?= /usr/local/easybuild-2019/easybuild/software/compiler/gcc/10.2.0/openblas/0.3.12/
BLASPACK    = -L$(OPENBLAS_ROOT)/lib -lopenblas

SCALAPACK_ROOT ?= /usr/local/easybuild-2019/easybuild/software/mpi/gcc/10.2.0/openmpi/4.0.5/scalapack/2.1.0
SCALAPACK   = -L$(SCALAPACK_ROOT)/lib -lscalapack

LLIBS      += $(SCALAPACK) $(BLASPACK)

# FFTW       ?= /opt/gfortran/fftw-3.3.6-GCC-5.4.1
FFTW       ?= /usr/local/easybuild-2019/easybuild/software/mpi/gcc/10.2.0/openmpi/4.0.5/fftw/3.3.8
LLIBS      += -L$(FFTW)/lib -lfftw3 -lfftw3_omp
INCS       = -I$(FFTW)/include

There is a further issue. If one is using GCC 10.x or greater there will be an argument mismatch. An error will occur like the following:

Error: Rank mismatch between actual argument at (1) and actual argument at (2) (rank-1 and scalar)

To get around this an additional Fortran flag must be added, resulting in:

#  For gcc-10 and higher require -fallow
FFLAGS     = -w -march=native -fallow-argument-mismatch

Another interesting error is that VASP has only been built for up to GCC7. The use of GCC 10 and MPI will result in an error and the reader_base.F file needs to be patched. This has been discussed on the VASP forums, which also has a copy of the patchfile. Modify the headers if necessary and apply the patch. e.g.,

patch < reader.patch 
patching file reader_base.F

Following this, an incremental build of the three core VASP binaries should work.

make std
make gam
make ncl

,

Simon LyallAudiobooks – December 2022

The Years of Lyndon Johnson. Book Four: The Passage of Power by Robert Caro

Covers 1958-1964. Especially the 1960 Democratic primary and election, Johnson’s unhappy Vice Presidency and the first months of his Presidency. As good as the others in the series. 4/5

England’s Villages: An Extraordinary Journey Through Time by Dr Ben Robinson

An archaeologist writes about the evolution of English Villages, their people, buildings, names and forms. Okay but not exceptional. 3/5

Jefferson: Architect of American Liberty by John B. Boles

A good single volume biography. Works hard to explain Jefferson’s attitudes especially on slavery. Good coverage and easy to follow. 4/5

Shutdown: How Covid Shook the World’s Economy by Adam Tooze

Covering roughly 2020 plus a few months on each side it mostly concentrates on the government and central bank measures to stabilise economies. 3/5

Doing Good Better: Effective Altruism and How You Can Make a Difference by William MacAskill

A Introduction to Effective Ultruism and how you can do the most good in the world via carefully picking charities to give to and other alternatives. 4/5

Leave the Gun, Take the Cannoli: The Epic Story of the Making of The Godfather by Mark Seal

Covers the writing of the book by Puzo, adapting and then filming it. Lots of Behind the scenes stories. A fun read 4/5

The 2020 Commission Report on the North Korean Nuclear Attacks Against the United States by Jeffrey Lewis

A future/alternative history where Trump’s America fights North Korea. Well done and relatively plausible. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Linux AustraliaCouncil Meeting 4th January 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Clinton Roy (Secretary)
  • Russell Stuart (Treasurer)
  • Jonathan Woithe (Council)
  • Neil Cox (Council)

Apologies 

  • None

 

Not Present

  • Lilac Kapul (Council)

 

Meeting opened at 20:08 AEDT by Joel  and quorum was achieved.

Minutes taken by Clinton Roy

 

2. Log of correspondence

  • Russell Coker – SPF issues with email coming via linux.org.au

3. Items for discussion

  • 2023 Budget
    • Discuss and modify as appropriate. Discussions covered employee assistance programs, making the SSC budget smaller (as it is mostly held online these days) and the budget of the bid reviews.
    • MOTION: The 2023 budget be accepted.
      MOVER: Russell Stuart
      SECONDER: Neill
      Outcome: Unanimous
  • Pycon Adelaide Convention Centre Deposit: is it still available, will it be used this year, if it won’t be used by Pycon what options do we have?
    • It is believed that there are plans to hold PyConAU in Adelaide in 2023, in which case the deposit will be used for that.
  • Who is returning to council next year?  If we need more nominations, put out a call for nominees. We currently do have enough nominations to fill all roles.

 

4. Items for noting

  • Progress on call for bids for EO / LCA 2024.
  • Likely final P&L by conference for 2022
  • Annual report Progress so far: Incomplete, draft, Annual Report.  Todo:
    • Lilac: current bio. 100..200 words + picture required.
    • Joel: President’s report.
    • Clinton: Secretary’s report.
    • Clinton: PyConAu steering report
    • Jonathan: Optional subcommittee reports for WordPress Steering (Wil provided in Matrix, is also in the “2022 Annual Report submission” directory on g-drive), LUV (coming Monday 9 Jan), and Admin Team (no response received to request, will ping again).  (We should consider shutting subcommittees we don’t receive reports from.   We won’t even ask IWS, so that should be shut down next year.)
    • Jonthan: Grant reports. “Big Sand” grant report is in (see “2022 Annual Report submission” directory on g-drive). A contact has been established for the E-textiles grant (the original applicant has been incapacitated) but no report has yet been received.  Have pinged again earlier today.
    • Neill:  Event report for LCA 2022,
    • About Linux Australia: The 2022 version will be used as is unless someone wants to update it.
    • Jonathan: Event report for Drupal.
    • Exec pics and bio.
    • Reports from other exec members: optional.
    • Everybody: Russell needs pictures and conference graphics.
    • Russell: Treasurer’s report.
    • Treasurer: Figures for graphic, budget for 2022/2023.

5. Other business

  • None

The post Council Meeting 4th January 2023 – Minutes appeared first on Linux Australia.

,

Linux AustraliaCouncil Meeting 21st December 2022 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Clinton Roy (Secretary)
  • Jonathan Woithe (Council)
  • Neil Cox (Council)
  • Lilac Kapul (Council)
  • Russell Stuart (Treasurer) (arrived at 8:25)

Apologies 

None

Not Present

None

Meeting opened at 20:05 AEDT by Clinton Roy and quorum was achieved.

Minutes taken by Clinton Roy

 

2. Log of correspondence

  • Several emails from members of the community around mailing list conduct on Monday. Given the nature of the events, there’s privacy concerns around naming the individuals. 

 

3. Items for discussion

  • Lilac asks about data trust, mails have been sent to them, that is in their court atm.
  • Have we put out a call for bids for EO / LCA 2024?
    • Plan to include it in an LA community update
    • Discussion around whether to include in the LA community update, or separate email – decided to have a separate email to be clear about it.
    • Suggestion that we should make it clear that we are here to auspice events, Council will review any proposals that come through.
    • Lilac suggests running smaller, less formal feedback sessions with the community. ACTION ITEM: secretary adding the suggestion of smaller, less formal feedback sessions to the council hand over document.
  • Mailing list conduct – especially Daniel Pocock
    • We should put out a statement that makes clear we do not agree with the message sent by Daniel Pocock on Monday, and that this runs foul of our Code of Conduct.
    • We should make clear that someone exhibiting such behavior cannot be accepted as part of our community. He is not a LA Member, and would not be accepted as one.
    • Important to emphasise that the Linux Australia values are here to foster collaboration, and note that this behaviour does not align.

 

4. Items for noting

  • Russell has begun working on the Annual Report.  This “Russell” does not create content, he just makes it look pretty.   Content comes from others.  To get a feel for the content you are asked to provide, look at last year’s report.
    • Treasurer: Figures for graphic, budget for 2022/2023.
    • About Linux Australia: The 2022 version will be used as is unless someone wants to update it.
    • All exec members: current bio. Lilac, Wil: 100..200 words required.  Others: If you don’t provide 100..200 words, the 2022 version will be used.
    • All exec members: bio pic.  Lilac, Wil: picture required.  Others: If you don’t provide 100..200 words, the 2022 version will be used.
    • Joel: President’s report.
    • Clinton: Secretary’s report.
    • Russell: Treasurer’s report.
    • Reports from other exec members: optional.
    • Jonathan: Event report for Drupal.
    • Neill:  Event report for LCA 2022,
    • Clinton: PyConAu steering report
    • Jonathan: Optional subcommittee reports for Drupal Steering, WordPress Steering, LUV, and Admin Team.  (We should consider shutting subcommittees we don’t receive reports from.   We won’t even ask IWS, so that should be shut down next year.)
    • Jonthan: Grant reports.
    • Everybody: Russell needs pictures and conference graphics.
  • Everything Open 2023
    • Sae Ra Germaine will lead the Session Selection Committee
    • Call for Sessions, Call for Volunteers and Financial Assistance applications are all now open on the website
  • Zoom account needs to have its credit card updated. ACTION ITEM:: Russell to update zoom  account credit card.
    • Note from Russell: It’s my CC.  It does expire 2023/02.  I presume it will be renewed the month before.

5. Other business

  • Election Prep. Joel will work with Sa Ra on the website, Clinton will mail out the announcement.

The post Council Meeting 21st December 2022 – Minutes appeared first on Linux Australia.

,

Simon LyallDonations 2022

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

I do a blog post about it to hopefully inspire others. See previous years: 2021, 2020, 2019, 2018, 2017, 2016, 2015

All amounts are in $US unless otherwise stated

General Charities

$750 to Givewell Top Charities fund . This was previously called their “Maximum impact fund”.

Software and Internet Infrastructure Projects

Last year I donated $100 each to SPI and SFC but this year I dropped it to $50 each and did direct donations to Python and Syncthing. I’m not sure which is the best strategy.

Others including content creators

Payments via Patreon

Current as of mid-December 2022

  • $2/month to Daniel King to make Chess videos
  • $1/month to Chris Stuckmann who does movie reviews
  • $2/month to The Prancing Pony Podcast who make a podcasts show about J R R Tolkien
  • $1/month to Joe Snodow who runs funny twitter accounts.
  • $1/month to Zach Weinersmith who creates SMBC Comic and other stuff
  • $1/video to The Nerdwriter who does Youtube videos
  • $1/month to CGP Grey who does Youtube Videos
  • $1/month to City Beautiful who is creating videos about cities and city planning.
  • $1/month to Alt Shift X who creates youtube videos
  • $2/month to Rose Eveleth who creates the Flash Forward podcast.
  • $1/month to RMTransit who does a Youtube channel on Transit.
  • $1/month to Quinn’s Ideas which is a Youtube Channel about Science Fiction (especially Dune)
  • $1/ month to Asianometry who creates youtube videos, mainly on Economics and the semiconductor industry.
  • $1/month to CityNerd who Videos on Cities and Transportation

Share

,

Linux AustraliaCouncil Meeting 7th December 2022 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Russell Stuart (Treasurer)
  • Jonathan Woithe (Council)
  • Neil Cox (Council)
  • Lilac Kapul (Council)

Apologies 

  • Clinton Roy (Secretary)

Not Present

  • None

 

Meeting opened at 20:10 AEDT by Joel and quorum was achieved.

Minutes taken by Jonathan.

 

2. Log of correspondence

  • 2022-11-26 Email: PyCon AU server exports email, Katie McLaughlin
  • 2022-11-28 Email: Building a data trust for speech data from Linux Australia conferences, Kathy Reid
  • 2022-12-05 Email:  LA List Censorship, Noel Butler (sent via Jonathan Woithe)

 

3. Items for discussion

  • LA List Censorship, Noel Butler.
    • In addition to the email sent to Jonathan, Joel also received a message privately the evening before. He requested information about the complaint against him: who complained, what the complaint was about, the evidence of the offence, why he was placed in moderation, and so on. Joel will forward this to the council list for archival purposes.
    • Per Wil, Kathy has previously mentioned on-list that the LA policy does not provide a right to most of this information.
    • Joel will respond to Noel later tonight.

 

4. Items for noting

  • Jonathan is contacting 2023 grant recipients to obtain their reports for the Annual Report.
  • Joel reports that Everything Open 2023 Call for Papers and Call for Volunteers will open tonight.

5. Other business

  • Status of Open Hardware kits from LCA2022: Joel met with Open Hardware team on Monday. They have provided text to Sae Ra and Joel to go out to those who purchased kits. Emails will be sent to people who ordered kits this week, with a broader announcement afterwards.

The post Council Meeting 7th December 2022 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 23rd November 2022 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Clinton Roy (Secretary)
  • Russell Stuart (Treasurer)
  • Jonathan Woithe (Council)
  • Lilac Kapul (Council)

 

Apologies 

  • None

Not Present

  • Neil Cox (Council)

 

Meeting opened at 20:15 AEDT by Joel and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

  • 2022-11-14 Kathy Reid Email – Grant application, draft letter of support for council consideration
  • 2022-11-22  Katie McLaughlin Email – Ping admin team and council about PyCon AU server exports
  • ASIC mail, 10 October, via Terry

3. Items for discussion

  • Kathy Reid – Grant application, draft letter of support. Jonathan is concerned that the timing is too close to the election, and that it might be improper to spend money this close to the election. Lilac points out that the application was done very early in the year. Jonathan points out the delay was mostly up to the applicants. Overall there’s no real problem with providing funding at this stage. ACTION ITEM Clinton to get the data trust team to put together a grant application, send it to council for consideration; council to assist getting a grant proposal ready; make it abundantly clear that the council is here to help and support grant proposals.
  • Katie McLaughlin email, ACTION ITEM Joel to follow up with the admin team re Katie Mclaughlin’s PyconAu Archive project.
  • ASIC mail, 10 October. A long discussion trying to work out if the paper mail is the current state of play or not. It does appear that the mail is addressing the current state of play. The issue relates to a detail on Form 490 which is being addressed by Lilac in connection with their Form 379 submission. Lilac is following this up with ASIC. 
  • 2023 Election date. Joel suggests AGM Saturday the 21st January at 1500 AEST (UT+1100), Suggest election nominations open 17th December, close 31st December, open voting 31st Dec, close voting on 14th Jan. Jonathan asks about hand over periods. Roughly works out to two weeks nominations, two weeks voting with voting closing a week before the AGM. Some back and forth twiddling the dates to confuse the secretary. ACTION ITEM Joel to communicate the dates with the admin team.
  • Clinton wants to get rid of paper.li. At this point we have a subscription for the next year, so we will review this when the next invoice comes in during 2023.

4. Items for noting

  • None

5. Other business

  • None

The post Council Meeting 23rd November 2022 – Minutes appeared first on Linux Australia.

,

Simon LyallAudiobooks – November 2022

Calculating God by Robert J. Sawyer

Aliens arrive on present-day Earth and one befriends a Canadian paleontologist. These of religion & alien civilizations are covered. Good read. 3/5

Working: Researching, Interviewing, Writing by Robert A Caro

A series of articles on the author’s process & experiences researching and writing his biographies. Short but interesting. 4/5

Consider Phlebas by Iain M. Banks

A “spy story” within a interstellar conflict, it introduces “The Culture” civilization. Reasonable main character and lots of stuff for Hard Core SF fans. 3/5

The Hunt for Vulcan: . . . And How Albert Einstein Destroyed a Planet, Discovered Relativity, and Deciphered the Universe by Thomas Levenson

Fun story following a few main characters (Thomas Edison has a cameo). 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Michael StillAnsible 7.0 onwards requires blocking IO from stdin, stdout, and stderr

Shaken Fist CI started failing this afternoon with this message logged:

ERROR: Ansible requires blocking IO on stdin/stdout/stderr.
Non-blocking file handles detected: <stdout>

Specifically this was happening when using ansible-galaxy to install some requirements, but the check is a more generic check than that was implemented by this ansible pull request, which appears to have been released with ansible-core 2.14 on November 8. That sat around until today, when ansible 7.0.0 was released and broke CI for me.

To be completely honest I’m not sure what’s happening here — somewhere in GitHub actions calling a shell script that calls ansible-galaxy the stdout file descriptor gets set to non-blocking and everything breaks. I’m unsure exactly where because its a pain to track down.

That said, Jack came to the rescue with this gem:

ansible-galaxy install andrewrothstein.etcd-cluster | cat -

Which unblocks me. It will be interesting to see if other people encounter problems with this change.

Linux AustraliaCouncil Meeting 9th November 2022 – Minutes

1. Meeting overview and key information

Present

  • Wil Brown (Vice-President)
  • Clinton Roy (Secretary)
  • Russell Stuart (Treasurer)
  • Jonathan Woithe (Council)
  • Neil Cox (Council)

Apologies 

Joel Addison (President)

Not Present

Lilac Kapul (Council)

 

Meeting opened at 20:02 AEDT by Wil Brown  and quorum was achieved.

Minutes taken by Clinton Roy

 

2. Log of correspondence

  • 2022-11-02 Email from Bret Busby regarding uceprotect
  • 2022-11-02 Email from Bret Busby regarding Noel Butler, SpamAssasin, and uceprotect

 

3. Items for discussion

  • Clarify uceprotect does not send out any .eml attachments causing further email list blocks. Uceprotect is an RBL. It does not send any email to anyone.
  • Dave Sparks speaking about the Drupal conference which was a week and a half ago. Conference was overall very successful, 239 attendees, about 20 more than their optimistic conference, about 12 walk up sign ons. Their venue was struggling with the number of attendees a bit, staff were a bit rusty.  Good vibe/energy during the conference, positive feedback about the conference. About fifty responses to a post event survey. Talks were rated 66% excellent, 30% OK, 2% poor. Do you see yourself coming to Wellington next year 55% yes. Mostly an Australian audience. $135k income. Half of ticket sales were not early bird. $93k from sponsorship. Sponsors were generally happy. $110 of expenses, still some invoices to come in.  Profit, at this early stage, is looking at around $40k. There were some AV staffing issues, and there will likely be a refund from them. Some recordings may have some sound issues. Still need to reconcile everything to get final numbers. Now aiming for Wellington, May 2023. Would like to continue using their event manager. Looking at holding a small event for sponsors. Would like to have a presence at everything open. Had an election for committee members in the lead up to the conference, three new committee members, all have experience at running events, and supporting local meetups; and have an agenda of running the conference and supporting the meetups. People enjoyed the hallway track as well as the talks. Had a good volunteer organiser, and a track chair organising talks. Despite the short turn around, organisers aren’t burnt out, and are motivated and energised to run the next event. Treasurer explains the process for the organising committee to request money for the next event. Treasurer notes one Drupal transaction that the council need to reconcile for the auditing period, discussions with Dave about how to resolve this. 
  • Stephen and Mike from the admin team. Not a lot to report. Mail issues with one member, they’re getting blocked because other people on their mail service are being labeled as spammers. Technical discussions about this issue. Steve asks about a date for the election, the council has not sorted a date yet. ACTION ITEM: Council to choose a date for the election, and tell the admin team. Steve warns there is a physical move of the servers coming up, not a big deal, but would be bad around the election. Some discussion of the dates chosen in previous years. Have added in a professional tier of slack, US$8.75/user/month, with the admin as users, everyone else as guests, to keep historical discussions available. Mike apologies for not making it every meeting, but the hour is late. Neill will look at the captcha soon, as requested by Steve. 

4. Items for noting

  • Dave Sparks emailed Wil Brown to say the Drupal South committee were having a post-conference debrief scheduled for last week, and from that, they’ll compile a report.

5. Other business

  • Russell will be asking for one paragraph bios, final reports, pictures for the AGM report.

The post Council Meeting 9th November 2022 – Minutes appeared first on Linux Australia.

Francois MarierName resolution errors in Ubuntu repositories while building a docker container

I ran into what seemed to be a DNS problem when building a Docker container:

Err http://archive.ubuntu.com jammy Release.gpg
  Could not resolve 'archive.ubuntu.com'

W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy/Release.gpg  Could not resolve 'archive.ubuntu.com'

W: Some index files failed to download. They have been ignored, or old ones used instead.

I found that many solutions talked about setting the default DNS server explicitly in /etc/docker/daemon.json:

{
    "dns": ["1.1.1.1"]
}

but that didn't work for me.

I noticed however that I was having these problems whenever I connected to my VPN. So what did work for me was restarting the docker daemon whenever there is a change in networking (e.g. enabling/disabling VPN) by putting the following in /etc/NetworkManager/dispatcher.d/docker-local:

#!/bin/sh

LOGFILE=/var/log/docker-restarts.log

if [ -z "$1" ]; then
    echo "$0: called with no interface" >> $LOGFILE
    exit 1;
fi

if [ "$1" = lo ]; then
    # Ignoring the loopback interface
    exit 0;
fi

case "$2" in
    up|vpn-up|down|vpn-down)
        echo "$0: restarting docker due to action \`$2' on interface \`$1'" >> $LOGFILE
        /bin/systemctl restart docker.service
        ;;
    *)
        echo "$0: ignoring action \`$2' on interface \`$1'" >> $LOGFILE
        ;;
esac

and then making that new file executable:

chmod +x /etc/NetworkManager/dispatcher.d/docker-local

You can confirm that it's working as intended by watching the logs:

tail -f /var/log/docker-restarts.log

while enabling/disable your VPN or your network connection. If you don't see any output, then something is wrong and the Docker restart isn't happening.

,

Michael StillUnix: a history and a memoir

It was a bit surprising to me that Brian Kernighan self-published a book about Unix history with Kindle Direct publishing, but given how many other books he’s published he must have his reasons for not using traditional channels for this one. The book is an engaging read, with quotes which still seem timely today popping up every so often. Certainly the decision to self-publish does not appear to have been because of a lack of effort put into the book. An example of a quote I think is still relevant today:

“Stable funding was a crucial factor for research. It meant that AT&T could take a long-term view and Bell Labs researchers had the freedom to explore areas that might not have a near-term payoff and perhaps never would. That’s a contrast with today’s world, in which planning often seems to look ahead only a few months, and much effort is spent on speculating on financial results for the next quarter.” (page 7).

Kernighan covers his own early career and the general functioning of Bell Labs, before starting to delve into the history of Unix. Describing at a high level early batch processing systems and then Multics, Kernighan describes how Multics suffered from the second systems effect (see The Mythical Man Month for a good description of that).

Overall, Kernighan describes what must have been an amazing work environment — a grouping of absolute leaders in their field at a time where there was so much fundamental technology to be invented and described. I can’t help but be a little jealous — I’m not sure there is a modern equivalent. Google for example would have had a similar set of circumstances at times in its history, but wasn’t as inclined to share with the greater world like Bell Labs did. Interestingly Eric Schmidt was a summer student at Bell Labs — he re-wrote Lex while there.

Another interesting contribution from Kernighan is the following rule, which I think we’ve lost sight of in a world of large monolithic code bases:

“…a good example of a general rule: if a program writes your code for you, the code will be more correct and reliable than if you write it yourself by hand. If the generator is improved… everyone benefits.” (page 96).

I think this applies to libraries as well — if you’re going to write something which might be useful to others, its much better off in a library where others can get to it than in your own codebase. That said, I think avoiding leftpad seems like a noble goal.

Overall this was an enjoyable book and I recommend it.

Unix Book Cover Unix
Brian W. Kernighan
Operating systems (Computers)
October 18, 2019
198

"The fascinating story of how Unix began and how it took over the world. Brian Kernighan was a member of the original group of Unix developers, the creator of several fundamental Unix programs, and the co-author of classic books like "The C Programming Language" and "The Unix Programming Environment".

,

sthbrx - a POWER technical blogWhat distro options are there for POWER8 in 2022?

If you have POWER8 systems that you want to keep alive, what are your options in 2022? You can keep using the legacy distribution you're still using as long as it's still supported, but if you want some modernisation, that might not be the best option for you. Here's the current landscape of POWER8 support in major distributions, and hopefully it helps you out!

Please note that I am entirely focused on what runs and keeps getting new packages, not what companies will officially support. IBM provides documentation for that. I'm also mostly focused on OpenPOWER and not what's supported under IBM PowerVM.

RHEL-compatible

Things aren't too great on the RHEL-compatible side. RHEL 9 is compiled with P9 instructions, removing support for P8. This includes compatible distributions, like CentOS Stream and Rocky Linux.

You can continue to use RHEL 8 for a long time. Unfortunately, Rocky Linux only has a Power release for EL9 and not EL8, and CentOS Stream 8 hits EOL May 31st, 2024 - a bit too soon for my liking. If you're a RHEL customer though, you're set.

Fedora

Fedora seems like a great option - the latest versions still support P8 and there's no immediate signs of that changing. The issue is that Fedora could change this with relatively little warning (and their big brother RHEL already has), Fedora doesn't provide LTS versions that will stay supported if this happens, and any options you could migrate to would be very different from what you're using.

For that reason, I don't recommend using Fedora on POWER8 if you intend to keep it around for a while. If you want something modern for a short-term project, go right ahead! Otherwise, I'd avoid it. If you're still keeping POWER8 systems alive, you probably want something more set-and-forget than Fedora anyway.

Ubuntu

Ubuntu is a mixed bag. The good news is that Ubuntu 20.04 LTS is supported until mid-2025, and if you give Canonical money, that support can extend through 2030. Ubuntu 20.04 LTS is my personal pick for the best distro to install on POWER8 systems that you want to have somewhat modern software but without the risks of future issues.

The bad news is that POWER8 support went away in Ubuntu 22.04, which is extremely unfortunate. Missing an LTS cycle is one thing, but not having a pathway from 21.10 is another. If you were on 20.10/21.04/21.10, you are completely boned, because they're all out of support and 22.04 and later don't support POWER8. You're going to have to reinstall 20.04.

If I sound salty, it's because I had to do this for a few machines. Hopefully you're not in that situation. 20.04 is going to be around for a good while longer, with a lot of modern creature comforts you'd miss on an EL8-compatible distro, so it's my pick for now.

OpenSUSE

I'm pretty ignorant when it comes to chameleon-flavoured distros, so take this with a grain of salt as most of it is from some quick searching. OpenSUSE Leap follows SLES, but without extended support lifetimes for older major versions. From what I can tell, the latest release (15.4) still includes POWER8 support (and adds Power10 support!), but similar to Fedora, that looks rather prone to a new version dropping P8 support to me.

If the 15.x series stayed alive after 16 came out, you might be good, but it doesn't seem like there's a history of that happening.

Debian

Debian 11 "bullseye" came out in 2021, supports POWER8, and is likely to be supported until around 2026. I can't really chime in on more than that because I am a certified Debian hater (even newer releases feel outdated to me), but that looks like a pretty good deal.

Other options

Those are just some major distros, there's plenty of others, including some Power-specific ones from the community.

Conclusion

POWER8's getting old, but is still plenty capable. Make sure your distro still remembers to send your POWER8 a birthday card each year and you'll have plenty more good times to come.

,

Simon LyallAudiobooks – October 2022

The Man from the Future: The Visionary Life of John von Neumann by Ananyo Bhattacharya

A good overview of von Neumann’s life and introduction to his most important work. An accessible read that keeps things interesting. 3/5

Sam Walton: Made in America by Sam Walton

Covers the authors life and especially the creation and growth of Walmart. Lots of details about running the business and the industry. 4/5

The Other Side of History: Daily Life in the Ancient World by Robert Garland

48 lectures covering daily life in Egypt, Greece, Rome and Medieval Britain. Plus a few other times & places. Quite interesting. 3/5

We Don’t Need Roads: The Making of the Back to the Future Trilogy by Caseen Gaines

An overview of the making of the movies. Some good stories and I’m sorry it wasn’t longer 3/5

The Hidden Habits of Genius: Beyond Talent, IQ, and Grit—Unlocking the Secrets of Greatness by Craig M. Wright

The “14 key traits of genius, from curiosity to creative maladjustment to obsession”. Some interesting stories but not much really actionable. 2/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

Dave HallUpgrading to AWS Lambda Powertools for Python v2

Learn how easy it is to upgrade AWS Lambda Powertools to version.

,

Lev LafayetteThe End of Duolingo?

In late 2015 I started using Duolingo, completing sixteen skills trees across ten languages since then. These were mainly, but not exclusively, European languages (including that pan-European auxiliary language, Esperanto). All of which gave some practical use on what were annual trips. In addition to the skill trees I made reasonable progress in Dutch, some progress in Catalan (from Spanish), Czech, and even the trio of Norwegian, Swedish, and Danish (on account of someone saying they were so similar). To say that I am a consistently active user of the application is fair; last year I was rated in the top 0.1% of users in the world.

For several years I have been a paid subscriber to the service at the princely sum of c$100 per year. However, a few days ago, I cancelled my subscription. The reason for this is quite simple; utterly horrendous changes to the user interface. Ignoring the principle of "if it ain't broke, don't fix it", the powers that be at Duolingo have foisted these changes upon a user community that is less than pleased. The Youtube video explaining the changes, at the time of writing, has 116K views; a mere 536 have liked the video and 1.8K had voted it down.

The problems are well-explained by many of the comments on the video; the loss of the self-paced and self-directed learning path in favour of a one-path-only approach, reminiscent of the boardgame, "Candyland", the substantial loss of screen real-estate, and the frustration of trying to reach a lesson of choice, the enforced combination of "lessons" and "stories" (I don't mind this), and the unnecessary animations (I always turned these off in the past). Of the close to 700 comments, almost every single one is negative. It would seem that others too will be ending their subscription, not because of the content of the application, but because of the changes, they made to how people use it.

Certainly, Duolingo has lost people in the past - closing down their forums, ending the language incubators, etc. Those policy changes were annoying, but the comments suggest this is different; this is a visceral hatred, "nerd rage quit" level of disappointment.

This is, of course, not the first time that one has witnessed a mass exodus from an application following radical changes to a user interface. Three years ago Niantic did the same to the game Ingress. When Ubuntu introduced the Unity Desktop, there was a significant switch by users to alternatives. Even when the Luna style was introduced with Windows XP, many users continued to use Windows Classic. One could even give the controversy when Dungeons & Dragons 4th edition was released compared to the 3/3.5 editions - alternative companies, still operating today - made a small fortune by continuing the old system.

The lesson to learn here is that even when usability experts point out numerous benefits to a new interface (e.g., Unity Desktop) or there is an improvement to content, or the marketing people think that making a system similar to other popular products (e.g., D&D 4th edition), a radical change to a user interface system is a hostile attack on the existing user community. The reason for this is deeply part of educational psychology; people engage primarily with content. The user interface is a system to get them to the content. When a user learns a system the process becomes unconscious. Only when the system actively gets in way of the user accessing the content is there a problem, and when that is the case small and incremental changes generate popularity, not rejection.

This is why radical changes in the interface invariably frustrate existing users. What was an unconscious process has to be relearned. If the design actually is more accessible, the learning curve is short, and the access to content is easier, then the period of frustration will be less if users remain with the product. Niantic managed not to bleed to zero Ingress users by allowing users the opportunity to tone down the resource-intensive and overwhelming graphics, for example. The Unity Desktop environment eventually became acceptable to Ubuntu users, as did the Luna style for MS Windows. Duolingo has made a new interface where the design is less accessible, and whilst with a short learning curve (you can only follow one path), access to content is significantly harder.

Duolingo's CEO, Luis von Ahn has made it clear that the "simpler" interface will not be changed and new users must use it now and (almost) all users by the end of October. He is betting that Duolingo's new interface can grow the number of users and operating income of the company. This is probably not going to happen; whilst Duolingo's revenues increased in 2022, the operating income and profits are in the territory of a a $60 million USD loss. Massive investment has been made in the changes, with the hope to reverse a decline in operating income and monthly users.

This is a crash-through or crash approach when a principle of sunk costs should be applied. Perhaps if the backlash from the user community is strong enough and they vote with their feet (and their wallets), the company will revert back to the more popular environment. In the meantime, and as a little hack, there is one way users can keep the old interface; set up a "school" (left-hand column), give it a name, go to "settings", select "older version" and whatever other options you desire (e.g., "multiple languages" taught), and then go back to Duolingo via the left-hand column. Apparently, this will remain in place until the end of the year. After that, there's a golden opportunity for different language applications.

,

Francois MarierMaking the mounting of an encrypted /home optional on a home server

I have a computer that serves as a home server as well as a desktop machine. It has an encrypted home directory to protect user files and, in the default configuration, that unfortunately interferes with unattended reboots since someone needs to be present to enter the encryption password.

Here's how I added a timeout and made /home optional on that machine.

I started by adding a one-minute timeout on the password prompt by adding timeout=60 in my /etc/crypttab:

crypt  UUID=7e12c123-abcd-5555-8c40-900d1f8cc281  none  luks,timeout=60

then I made /home optional by adding nofail to the appropriate mount point in /etc/fstab:

/dev/mapper/crypt  /home  ext4  nodev,noatime,nosuid,nofail  0  2

Before that, the password prompt would timeout but the system would be unable to boot since one of the required partitions had failed to mount.

Now, to ensure that I don't accidentally re-create home directories for users when the system is mounted without a /home, I made the /home directory on the non-encrypted drive read-only:

umount /home
cd /home
chmod a-w .

Finally, with all of this in place, I was now happy to configure the machine to automatically reboot after a kernel panic by putting the following in /etc/sysctl.d/local.conf:

# Automatic reboot 10 seconds after a kernel panic
kernel.panic = 10

since I know that the machine will come back up just fine and that all services will be running. I simply won't be able to log into that machine as any other user than root until I manually unlock and mount /home.

,

sthbrx - a POWER technical blogPower kernel hardening features in Linux 6.1

Linux 6.1-rc1 was tagged on October 16th, 2022 and includes a bunch of nice things from my team that I want to highlight. Our goal is to make the Linux kernel running on IBM's Power CPUs more secure, and landed a few goodies upstream in 6.1 to that end.

Specifically, Linux 6.1 on Power will include a complete system call infrastructure rework with security and performance benefits, support for KFENCE (a low-overhead memory safety error detector), and execute-only memory (XOM) support on the Radix MMU.

The syscall work from Rohan McLure and Andrew Donnellan replaces arch/powerpc's legacy infrastructure with the syscall wrapper shared between architectures. This was a significant overhaul of a lot of legacy code impacting all of powerpc's many platforms, including multiple different ABIs and 32/64bit compatibility infrastructure. Rohan's series started at v1 with 6 patches and ended at v6 with 25 patches, and he's done an incredible job at adopting community feedback and handling new problems.

Big thanks to Christophe Leroy, Arnd Bergmann, Nick Piggin, Michael Ellerman and others for their reviews, and of course Andrew for providing a lot of review and feedback (and prototyping the syscall wrapper in the first place). Our syscalls have entered the modern era, we can zeroise registers to improve security (but don't yet due to some ongoing discussion around compatibility and making it optional, look out for Linux 6.2), and gain a nice little performance boost by avoiding the allocation of a kernel stack frame. For more detail, see Rohan's cover letter.

Next, we have Nicholas Miehlbradt's implementation of Kernel Electric Fence (KFENCE) (and DEBUG_PAGEALLOC) for 64-bit Power, including the Hash and Radix MMUs. Christophe Leroy has already implemented KFENCE for 32-bit powerpc upstream and a series adding support for 64-bit was posted by Jordan Niethe last year, but couldn't proceed due to locking issues. Those issues have since been resolved, and after fixing a previously unknown and very obscure MM issue, Nick's KFENCE patches have been merged.

KFENCE is a low-overhead alternative to memory detectors like KASAN (which we implemented for Radix earlier this year, thanks to Daniel Axtens and Paul Mackerras), which you probably wouldn't want to run in production. If you're chasing a memory corruption bug that doesn't like to present itself, KFENCE can help you do that for out-of-bounds accesses, use-after-frees, double frees etc without significantly impacting performance.

Finally, I wired up execute-only memory (XOM) for the Radix MMU. XOM is a niche feature that lets users map pages with PROT_EXEC only, creating a page that can't be read or written to, but still executed. This is primarily useful for defending against code reuse attacks like ROP, but has other uses such as JIT/sandbox environments. Power8 and later CPUs running the Hash MMU already had this capability through protection keys (pkeys), my implementation for Radix uses the native execute permission bit of the Radix MMU instead.

This basically took me an afternoon to wire up after I had the idea and I roped in Nicholas Miehlbradt to contribute a selftest, which ended up being a more significant engineering effort than the feature implementation itself. We now have a comprehensive test for XOM that runs on both Hash and Radix for all possible combinations of R/W/X upstream.

Anyway, that's all I have - this is my first time writing a post like this, so let me know what you think! A lot of our work doesn't result in upstream patches so we're not always going to have kernel releases as eventful as this, but we can post summaries every once in a while if there's interest. Thanks for reading!

,

Andrew RuthvenLet's Encrypt with Octavia in OpenStack

I like using Catalyst Cloud to host some of my personal sites. In the past I used to use CAcert for my TLS certificates, but more recently I've been using Let's Encrypt for my TLS certificates as they're trusted in all browsers. Currently the LoadBalancer as a Service (LBaaS) in Catalyst Cloud doesn't have built in support for Let's Encrypt. I could use an apache2/nginx proxy and handle the TLS termination there and have that manage the Let's Encrypt lifecycle, but really, I'd rather use LBaaS.

So I thought I'd set about working out how to get Dehydrated (the Let's Encrypt client I've been using) to drive LBaaS (known as Octavia). I figured this would be of interest to other people using Octavia with OpenStack in general, not just Catalyst Cloud.

There's a few things you need to do. These instructions are specific to Debian:

  1. Install and configure Dehydrated to create the certificates for the domain(s) you want.
    • apt install barbican
  2. Create the LoadBalancer (use the API, ClickOps, whatever), just forward port 80 for now (see sample Apache configs below).
  3. Save the sample hook.sh below to /etc/dehydrated/hook.sh, you'll probably need to customise it, mine is a bit more complicated!
  4. Insert the UUID of your LoadBalancer in hook.sh where LB_LISTENER is set.
  5. Create /etc/dehydrated/catalystcloud/password as described in hook.sh
  6. Save OpenRC file from the Catalyst Cloud dashboard as /etc/dehydrated/catalystcloud/openrc.sh
  7. Install jq, openssl and the openstack tools, on Debian this is:
    • apt install jq openssl python3-openstackclient python3-barbicanclient python3-octaviaclient
  8. Add TLS termination to your LoadBalancer
  9. You should be able to rename the latest certs /var/lib/dehydrated/certs/$DOMAIN and then run dehydrated -c to have it reissue and then deploy a cert.

As we're using HTTP-01 Challenge Type here, you need to have the LoadBalancer forwarding port 80 to your website to allow for the challenge response. It is good practice to have a redirect to HTTPS, here's an example virtual host for Apache:

<VirtualHost *:80>
    ServerName www.example.com
    ServerAlias example.com

    RewriteEngine On
    RewriteRule ^/.well-known/ - [L]
    RewriteRule ^/(.*)$ https://www.example.com/$1 [R=301,L]

    <Location />
        Require all granted
    </Location>
</VirtualHost>
You all also need this in /etc/apache2/conf-enabled/letsencrypt.conf:
Alias /.well-known/acme-challenge /var/lib/dehydrated/acme-challenges

<Directory /var/lib/dehydrated/acme-challenges>
        Options None
        AllowOverride None

        # Apache 2.x
        <IfModule !mod_authz_core.c>
                Order allow,deny
                Allow from all
        </IfModule>

        # Apache 2.4
        <IfModule mod_authz_core.c>
                Require all granted
        </IfModule>
</Directory>

And that should be all that you need to do. Now, when Dehydrated updates your certificate, it should update your LoadBalancer as well!

Sample hook.sh:
deploy_cert() {
    local DOMAIN="${1}" KEYFILE="${2}" CERTFILE="${3}" FULLCHAINFILE="${4}" \
          CHAINFILE="${5}" TIMESTAMP="${6}"
    shift 6

    # File contents should be:
    #   export OS_PASSWORD='your password in here'
    . /etc/dehydrated/catalystcloud/password

    # OpenRC file from the Catalyst Cloud dashboard
    . /etc/dehydrated/catalystcloud/openrc.sh --no-token

    # UUID of the LoadBalancer to be managed
    LB_LISTENER='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

    # Barbican uses P12 files, we need to make one.
    P12=$(readlink -f $KEYFILE \
        | sed -E 's/privkey-([0-9]+)\.pem/barbican-\1.p12/')
    openssl pkcs12 -export -inkey $KEYFILE -in $CERTFILE -certfile \
        $FULLCHAINFILE -passout pass: -out $P12

    # Keep track of existing certs for this domain (hopefully no more than 100)
    EXISTING_URIS=$(openstack secret list --limit 100 \
        -c Name -c 'Secret href' -f json \
        | jq -r ".[]|select(.Name | startswith(\"$DOMAIN\"))|.\"Secret href\"")

    # Upload the new cert
    NOW=$(date +"%s")
    openstack secret store --name $DOMAIN-$TIMESTAMP-$NOW -e base64 \
        -t "application/octet-stream" --payload="$(base64 < $P12)"

    NEW_URI=$(openstack secret list --name $DOMAIN-$TIMESTAMP-$NOW \
        -c 'Secret href' -f value) \
        || unset NEW_URI

    # Change LoadBalancer to use new cert - if the old one was the default,
    # change the default. If the old one was in the SNI list, update the
    # SNI list.
    if [ -n "$EXISTING_URIS" ]; then
        DEFAULT_CONTAINER=$(openstack loadbalancer listener show $LB_LISTENER \
            -c default_tls_container_ref -f value)

        for URI in $EXISTING_URIS; do
            if [ "x$URI" = "x$DEFAULT_CONTAINER" ]; then
                openstack loadbalancer listener set $LB_LISTENER \
                    --default-tls-container-ref $NEW_URI
            fi
        done

        SNI_CONTAINERS=$(openstack loadbalancer listener show $LB_LISTENER \
            -c sni_container_refs -f value | sed "s/'//g" | sed 's/^\[//' \
            | sed 's/\]$//' | sed "s/,//g")

        for URI in $EXISTING_URIS; do
            if echo $SNI_CONTAINERS | grep -q $URI; then
                SNI_CONTAINERS=$(echo $SNI_CONTAINERS | sed "s,$URI,$NEW_URI,")
                openstack loadbalancer listener set $LB_LISTENER \
                    --sni-container-refs $SNI_CONTAINERS
            fi
        done

        # Remove old certs
        for URI in $EXISTING_URIS; do
            openstack secret delete $URI
        done
    fi
}

HANDLER="$1"; shift
#if [[ "${HANDLER}" =~ ^(deploy_challenge|clean_challenge|sync_cert|deploy_cert|deploy_ocsp|unchanged_cert|invalid_challenge|request_failure|generate_csr|startup_hook|exit_hook)$ ]]; then
if [[ "${HANDLER}" =~ ^(deploy_cert)$ ]]; then
    "$HANDLER" "$@"
fi

,

Dave HallTracking Infrastructure with SSM and Terraform

Use AWS SSM Parameter Store to share resource references with other teams.

,

Tim RileyOpen source status update, September 2022

Hello there, friends! This is going to be a short update from me because I’m deep in the throes of Hanami 2.0 release preparation right now. Even still, I didn’t want to let September pass without an update, so let’s take a look.

A story about Hanami::Action memory usage

September started and ended with me looking at the r10k memory usage charts for hanami-controller versus Rails. The results were surprising!

Initial memory usage for Hanami::Action vs Rails

We’d been running some of these checks as part of our 2.0 release prep, the idea being that it’d help us shake out any obvious performance improvements we’d need to make. And it certainly did in this case! Hanami (just like its dry-rb underpinnings) is meant to be the smaller and lighter framework; why were we being outperformced by Rails?

To address this I wrote a simple memory profile script for Hanami::Action inheritance (now checked in here) and started digging.

Here were there initial results:

Total allocated: 184912288 bytes (1360036 objects)
Total retained:  104910880 bytes (780031 objects)

allocated memory by gem
-----------------------------------
  56242240  concurrent-ruby-1.1.10
  53282480  dry-configurable-0.15.0
  34120000  utils-8585be837309
  30547488  other
  10720080  controller/lib

That’s 185MB allocated for 10k subclasses, with concurrent-ruby, dry-configurable and hanami-utils being the top three gems allocating memory.

This led me straight to dry-configurable, and after a couple of weeks of work, I arrived at this PR, separating our storage of setting definitions from their configured values, among other things. This change allows us to copy less data at the moment of class inheritance, and in the case of a dry-configurable-focused memory profile, cut the allocated memory by more than half.

From there, I moved back into hanami-controller and updated it to use dry-configurable for all of its inheritable attributes (some were handled separately), also taking advantage the support for custom config classes that Piotr added so we could preserve Hanami::Action’s existing configuration API.

This considerably improved our benchmark! Behold:

Total allocated: 32766232 bytes (90004 objects)
Total retained:  32766232 bytes (90004 objects)

allocated memory by gem
-----------------------------------
  21486072  other
  10880120  dry-configurable-0.16.1
    400040  3.1.2/lib

Yes, we brought 185MB allocated memory down to 33MB! This also brought us on par with Rails in the extreme end of the r10k memory usage benchmark:

Updated memory usage for Hanami::Action vs Rails

Here’s a thing though: the way r10k generates actions for its Rails benchmark is to create a single controller class with a method per action. So for the point on the far right of that chart, that’s a single class with 10k methods. Hardly realistic.

So I made a quick tweak to see how things would look if the r10k Rails benchmark generated a class per endpoint like we do with Hanami::Action:

Hanami::Action vs Rails with a separate controller class per action

That’s more like it. This is another extreme, however: more realistically, we’d see Rails apps with somewhere between 5-10 actions per controller class, which would lower its dot a little in that graph. In my opinion this would be a useful thing to upstream into r10k. It’s already a contrived benchmark, yes, but it’d be more useful if it at least mimicked realistic application structures.

Either way, we finished the month much more confident that we’ll be delivering on our promise of Hanami as the lighter, faster framework alternative. A good outcome!

Along the way, however, things did feel bleak at times. I wasn’t confident that I’d be able to make things right, and it didn’t feel great to think we might’ve spent years putting somethign together that wasn’t going to be able to deliver on some of those core promises. Luckily, I found all the wins we needed, and learnt a few things along the way.

Hanami 2.0, here we come

What else happened in September? Possibly the biggst thing is that we organised ourselves for the runway towards the final Hanami 2.0.0 release.

We want to do everything possible to make sure the release happens this year, so I spent some time organising the remaining tasks on our Trello board into date-based lists, aiming for a release towards the end of November. It looked achievable! The three of us in the core team re-committed ourselves to doing everything we could to complete these tasks in our estimated timeframes.

So far, things have gone very well!

Hanami 2.0.0 release progress on Trello

We’ve all been working tremendously hard, and so far, this has let us keep everything to the schedule. I’ll have a lot to share about our work across October, but that’s all for next month’s update. So in the meantime, I have to put my head back down and get back to shipping a framework. See you all again soon!

Lev LafayetteBorderline Personality Disorder: A Summary

This is a summary of what I have learned over the past two years, after my first direct encounter with what is called Borderline Personality Disorder (BPD). Whilst I do not have BPD (although everyone is a little bit on every mental health continuum), I do endeavour to be a loyal and committed ally of people with BPD (pwBPD). In a very real sense, I wish I knew then what I do now; but at least I have made the effort to learn. I hope that these notes are useful to others. For anyone who wishes to be a sincere ally (a catch-all term that should include partners, family, and friends) of a person with BPD it is absolutely necessary to make the effort to listen to the pwBPD and to educate yourself using scholarly sources. Not making the effort means that you're not an ally, regardless of how close you think you are to the person, and not using scholarly sources will cause more harm and prejudice than good.

This document was initially written at the end of BPD Awareness Week 2022 in Australia and for World Mental Health Day, and will be updated as new information comes to hand. Throughout all the content here it is emphasised that (a) quantifiers are always required (many, most, some, etc) and every BPD person is unique and will not show all characteristics and (b) always see the person. Please note that I am not a psychologist, although I am a student of the subject and have completed most of a Graduate Diploma of Applied Psychology at the University of Auckland. I encourage people to donate to the Australian BPD Foundation.

Warning: This article mentions suicide, self-harm, and abuse.

Last update: March 14, 2023

Definition and Prevalence

"Borderline Personality Disorder" is a mental health condition marked by a long-term pattern of intense emotional reactions, divergent moods, unstable interpersonal relations, impulsivity, and issues in self-identity and self-direction. The term itself was coined when the condition of behaviours was deemed to be on the borderline of psychosis (difficulty in determining what is real) and neuroticism (disorders that cause constant distress), where a neurotic person in a time of stress would show signs of psychosis. Whilst neither 'psychosis' nor 'neuroticism' are used as formal mental health descriptors, the term "borderline" has stuck. The term was included in DSM-III (1980) where it remains to the current edition. An alternative, and more intuitive term, is "Emotionally unstable personality disorder" (EUPD).

The median prevalence of BPD is c1% (Ellison et al, 2018). In clinical settings, BPD prevalence is around 10-12% in outpatient psychiatric clinics and 20-22% among inpatient clinics. Prevalence is notably higher among incarcerated individuals and notably lower among the elderly. There is a pronounced gender distinction with women diagnosed over men at a 3:1 ratio (Skodol, Bender, 2003). Underdiagnosis and misdiagnosis are unfortunately common, with over 40% of pwBPD had been previously misdiagnosed with other disorders like bipolar disorder or major depressive disorder (Ruggero et al, 2010).

Diagnosis and Symptoms

The DSM-5 (p663, 2013) gives the following as diagnostic criteria. Formal diagnosis requires satisfying of five or more of the criteria.

1. Frantic efforts to avoid real or imagined abandonment (Note: Do not include suicidal or self-mutilating behaviour covered in Criterion 5)
2. A pattern of unstable and intense interpersonal relationships characterised by alternating
between extremes of idealisation and devaluation
3. Identity disturbance: markedly and persistently unstable self-image or sense of self
4. Impulsivity in at least two areas that are potentially self-damaging (e.g. spending, sex, substance abuse, reckless driving, binge eating) (Note: Do not include suicidal or self-mutilating behaviour covered in Criterion 5)
5. Recurrent suicidal behaviour, gestures, or threats, or self-mutilating behaviour
6. Affective instability due to a marked reactivity of mood (e.g. intense episodic dysphoria, irritability or anxiety usually lasting a few hours and only rarely more than a few days)
7. Chronic feelings of emptiness
8. Inappropriate, intense anger or difficulty controlling anger (e.g. frequent displays of temper,
constant anger, recurrent physical fights)
9. Transient, stress-related paranoid ideation or severe dissociative symptoms

There are similar criteria for the International Classification of Diseases (11th Revision) which describes "the borderline pattern descriptor" as follows:

A pervasive pattern of instability of interpersonal relationships, self-image, and affects, and marked impulsivity, as indicated by many of the following:

1. Frantic efforts to avoid real or imagined abandonment
2. A pattern of unstable and intense interpersonal relationships
3. Identity disturbance, manifested in markedly and persistently unstable self-image or sense of self
4. A tendency to act rashly in states of high negative affect, leading to potentially self-damaging behaviours
5. Recurrent episodes of self-harm
6. Emotional instability due to marked reactivity of mood
7. Chronic feelings of emptiness
8. Inappropriate intense anger or difficulty controlling anger
9. Transient dissociative symptoms or psychotic-like features in situations of high affective arousal

If one thinks that they fit the criteria for BPD it is essential to seek a professional diagnosis. Without professional treatment, one is taking an enormous risk of harm to themselves and others. Likewise, if one thinks that another person fits the criteria, raise the matter very gently and delicately with a motivation of care and with recognition and self-awareness that you are not a professional.

Causes and Neurology

Borderline personality disorder often begins in adolescence or early adulthood. It is characterized by problems with interpersonal relationships (they are intense, alternating between idealization and devaluation), mood (depression and especially inappropriate, intense anger), and unstable self-image. Current estimates of the general population prevalence of borderline personality disorder range up to 5.9 percent, and recent studies of college students suggest that up to 17 percent struggle with significant borderline traits. Borderline personality disorder is associated with psychiatric disability, substance abuse, eating disorders, and medical problems. BPD patients showed significantly higher scores on both primary and secondary global rates of psychopathic behaviour associated with patterns of executive dysfunction (López-Villatoro et al, 2020)

The heritability of BPD is between 37% to 69%, a rather wide range (Gunderson et al, 2011), with indications that is one of the most heritable disorders (Torgersen et al, 2000). However, even when researchers do note specific linkages to genetics variation between genetic and environmental factors are balanced at 42%/58% (Distel et al, 2008). These environmental factors are commonly associated with the result of childhood trauma such as neglect and abuse; there is little doubt that a person who has experienced childhood trauma is at an increased risk for developing BPD and PTSD (Cattane et al 2017).

Real-time brain imaging scans have established that pwBPD are physically unable to regulate emotions (Nauret, 2017). Neuroimaging shows that pwBPD typically has a reduction in the brain's regions that regulate stress responses, emotions, and decision-making including the amygdala, the hippocampus, and the orbitofrontal cortex (O'Neill, Frodl, 2012). There is dysregulation of the hypothalamic-pituitary-adrenal axis, responsible for the production of cortisol, released during times of stress; pwBPD have abnormal levels of cortisol production (Cattane, et al 2017), reflected in damage erosion of the very areas of the brain responsible for stress regulation and decision-making. Amygdala damage is associated with impulsive behaviour, a lessens aversion to risk and loss (Gupta et al 2011), and also with hypervigilance (Terburg et al, 2012). Damage to the amygdala (emotional processing) and the hippocampus (declarative of episodic recollection) also reduces the capacity for memory (Yang, Wang, 2017). These all contribute to BPD being described as the mental illness with the highest level of psychological pain.

Comorbidities

There are a number of comorbidities with BPD. The following are a few words on the most common, including Eating Disorders, Attention Deficit Hyperactivity Disorder, (complex and chronic) Post-Traumatic Stress Disorder, Narcissistic Personality Disorder, and Bipolar Disorders

Eating disorders and BPD are co-morbid, with some 53.8% co-occurrence from one extensive study (Zanarini, et al 2010), compared to 24.6% of patients with other personality disorders and more specifically, 21.7% of patients with BPD met criteria for anorexia nervosa and 24.1% for bulimia nervosa. Like other co-morbidities, an association has been drawn between eating disorders, BPD, and the environmental factor of childhood trauma, whether in the form of neglect or abuse (Sansone, Sansone, 2017).

Attention deficit hyperactivity disorder (ADHD) and BPD, is another frequent comorbidity, in a clinical setting, ranging from 16.1 to 38% of BPD patients (Weiner et al, 2019). This comorbidity questions whether it is appropriate to view either as an entirely early-onset neurological disorder (ADHD) or a later-onset environmental disorder (BPD). As with many other comorbidities, the expression of characteristics is more severe, hence people with ADHD and BPD are even more impulsive than those with BPD alone, and with a higher level of emotional dysregulation than those with ADHD alone.

Post-traumatic stress disorder (PTSD) including complex PTSD, and borderline personality disorder commonly co-occur, approximately 25-30% (Pagura et al 2010, Frías and Palma, 2015) of the time. Whilst PTSD is characterised by (a) a sense of threat, (b) avoidance, and (c) re-experiencing. complex PTSD has, in addition, (d) interpersonal avoidance and difficult interpersonal relationships (e) negative self-concept, and (f) affective instability. BPD does not have (a), (b), and (c), but does have (d), (e), (f) and, in addition, (g) anger, (h) chronic emptiness, (i) self-injury behaviours (j) transient psychotic and dissociation and (k) fear of abandonment. Individuals with comorbid PTSD-BPD have a poorer quality of life on average, with higher levels of self-harm.

Narcissistic Personality Disorder (NPD) and Borderline Personality Disorder (BPD) are both "cluster B" disorders, characterised by dramatic and intense behaviour (at least to observers), and impulsive behaviour. This cluster includes NPD, BPD, anti-social personality disorder, histrionic personality disorder, etc In addition to this general overlap the co-occurrence of BPD and NPD has been assessed from a range of 13% (Hörz-Sagstetter et al 2018) to 39% (Grant et al, 2008). There is a possibility that it is particularly associated with "vulnerable narcissism", whose traits include hypersensitivity, defensiveness, and low self-esteem. People with NPD and BPD are less likely to see a remission of BPD as NPD people have a lower motivation to seek therapy, and NPD is very difficult to treat (Caligor et al, 2015).

Bipolar Disorders and BPD also occur, in approximately 20% of cases (Zimmerman, 2019) and there is an ongoing discussion on whether BPD should be part of the bipolar spectrum, although most recent literature suggests that they are distinct, and the debate has actually sidetracked from the substantive issue. Like other comorbid states, people with "borderpolar" have higher levels of impaired functioning, substance abuse disorders, and self-harm (Patel et al, 2019). Further, people with BPD and a Bipolar Disorder are more likely to have PTSD as well, generating an especially challenging combination that has been insufficiently researched and is likely underdiagnosed.

Prognosis

BPD conditions remain throughout the lifespan, although with variations in symptoms (Biskin, 2015). In some cases BPD symptoms can be observed in childhood, however, there is an absence of evidence regarding the course of development of those who do not meet the full criteria. Adolescence is usually when BPD is recognised, although there is evidence of remission in follow-up studies ranging from 40% to 65%, although residual symptoms are not always predictable. Adult BPD longitudinal studies also suggest a gradual decline in symptoms, with periods of remission of recurrence. The decline of symptoms was mainly in the behavioural aspects of impulsivity; self-harm and suicide remained a factor with one large study indicating a 10% suicide rate after 27 years of follow-up, mainly patients in their 30s with multiple failed treatments. Even with a decline in symptoms over time functional recovery - defined as remission along with full-time vocational or educational activity and at least one stable and supportive relationship with a close friend or partner - occurred in only just more than 50% of patients (Zanarini et al, 2012).

People with BPD have a reduced life expectancy ranging from 14 to 27.5 years, with a median value of 20 (Castle, 2019). Most of the early mortality is largely due to cardiovascular deaths with major risk factors (e.g., obesity, smoking, poor diet, and lack of exercise) significantly greater among people with BPD. Other notable risk factors include arteriosclerosis, hypertension, hepatic disease, arthritis, gastrointestinal disease, cardiovascular disease, and sexually transmitted diseases. These can be attributed to maladaptive lifestyle choices (smoking, drugs, alcohol, diet) as well as iatrogenic (prescription medicines). This is hardly helped by chronic sleep issues (Selby, 2013). The problems are often compounded with a person with BPD having comorbidity, and also by the stigma attached by BPD even in the responses of health professionals. Suicide rates vary from up to 10% of cases from follow-back research, or from 3-6% in prospectively followed cohorts, and most occur later in life (mean age of 37, standard deviation of 10) (Paris, 2019)

Treatment

There is no cure for BPD, but recovery and management are possible. There have only been very modest evidence of neurogenesis of the amygdala (Jhaveri, 2018) and mood disorders are known to weaken the prospect of neurogenesis of the hippocampus. In other words, the very experience of having BPD reduces the possibility of recovery from BPD (Toda et al, 2019). There is evidence that deep brain stimulation can help relieve some psychological and behavioral side effects, such as hypervigilance (Langevin, 2012).

There are some regularly prescribed medications for pwBPD, typically antipsychotics (Grootens, Verkes, 2005) and mood stabilisers (Lieb et al, 2010). Usually, psychotherapy has been shown to be particularly beneficial, with Dialectical Behaviour Therapy (DBT) offering the greatest rates of success (Choi-Kain, 2017). It is, of course, not something that necessarily works for everybody with BPD, and other therapies may be more appropriate depending on the individual (e.g., schema therapy, mentalisation-based treatment, transference-focussed psychotherapy). A particular warning is raised for matters of misdiagnosis, especially with common co-morbidities such as PTSD. In many cases, a treatment that is very effective for PTSD can aggravate BPD and vice-versa e.g., trauma history, mood swings, and alienation from others (Hammond, 2020)

Community

The BPD Community consists of people with lived experience of Borderline Personality Disorder. This can include people with the illness (diagnosed, undiagnosed, treated, and in remission), their close friends, family, partners, and allies. The following are a few short comments on the lived experiences of both pwBPD and those in their life. This section of the summary is somewhat more informal than what has preceded.

Availability, Understanding, Solutions

A common error by allies have when a pwBPD is having an episode of extreme emotions (anger, sadness, anxiety, etc) is that they seek to provide rational solutions to what is a perceived problem. This may be a genuine response motivated by care and love, but it is not the appropriate approach. A person in such a situation is experiencing an emotional disturbance and the experience must also be dealt with emotionally. People with BPD have at least equal and often heightened levels of emotional empathy, but their emotional cognition and performance are quite poor (Niedtfeld, 2017). This is often referred to as "the Borderline Empathy Paradox", where it is common of pwBPD to detect even subtle emotional states of others the also typically have serious deficits of cognitive and behavioural empathy (Salgado et al 2020).

The following steps - availability, understanding, and solutions - must be carried out in order with each step depending on the preceding. An alternative name for these is "SET theory" (Kreisman, 2018), standing for "Support, Empathy, Truth", although that will get confusing for people with an interest in discrete mathematics (such as the author).

Availability: One should recall that a pwBPD suffers a chronic fear of abandonment; thus availability must be of the first priority. Simply being present can help alleviate the fear. Statements of support and engagement are also valuable: "I am here for you", "I care about you", "I want to help", etc.

Understanding: Once availability is established, the pwBPD is likely to express their feelings. Their ally must display empathy and understanding at this point. The pwBPD may seek to ground their feelings in events or interpretations that might be completely erroneous, conflated, etc. The ally should not seek to correct them or downplay the real or imagined causes, but rather validate the emotions. This requires some attention on the behalf of the ally to listen to the feelings as well as the words being used. Feelings are ALWAYS valid, even if the reasons are not and the pwBPD feels their feelings more viscerely than anyone else. The ally should give statements that validate the feelings: "This must be very frustrating for you".

Solutions: Only once the pwBPD has an assurance of an ally's availability, and the empathic rapport of understanding and validating their emotional state is established, should potential solutions be offered. These need to be factual or based on the ally's commitments (and the ally had better follow through): "This is what I can do to help", "If you do x, then y will happen. Perhaps consider z", etc.

Shame, Guilt, and Remorse

It is virtually a given that pwBPD will engage in words or actions that are very hurtful and damaging to those close to them. Unlike people who have limited emotional range or capacity, a pwBPD feels emotions, including shame and guilt, intensely. The coping mechanisms and responses of pwBPD however are typically very poor and they will often hold on to shame and guilt in a manner that is damaging to their self-esteem (through self-loathing), despair (avoiding establishing commitment through fear of hurting people in the future), or even various of self-harm. For pwBPD it is essential that they learn to turn shame into guilt and guilt into remorse, otherwise the pain will be ongoing.

Shame is more prevalent among pwBPD than guilt (Peters, Geiger 2016). Shame reflects the individual's negative self-concept and self-loathing, and represents the accumulated negative beliefs that the individual has toward themselves. With pwBPD it is an important contributor to anger, unstable mood, instability of interpersonal relationships, externalisation of blame, and self-harm. When a pwBPD is triggered by events that generate shame, including the results of their own impulsivity and other behaviours. However, it is necessary for a pwBPD to develop guilt about actions rather than accumulating further shame about them. Guilt at least focuses on the event and identifies the need to change behaviour, rather than adding to the negative self-image of shame which is an interrnalised and private pain.

The difference between guilt and remorse involves taking ownership of what a person has done that has hurt another. Contact, even indirect contact if necessary, of those that have been impacted is suggested. Informing the wronged party that one feels that they have wronged them and that one is changing themself so it doesn't happen again, is required. Further, asking if there's anything that can be done to make amends is a full acceptance of responsibility. There is no onus on the wronged party to give forgiveness or to accept any offer of amends. However, in most cases, people are forgiving when they see a genuine attempt in a person to change.

Mirroring, Splitting, Discarding, Reconnection

Borderline
Feels like I'm going to lose my mind
You just keep on pushing my love
Over the borderline
-- Madonna, Borderline (1983)

The famous Madonna song, for what it's worth, actually is not about BPD, but for those in a close relationship with a pwBPD, they experience of having one's love "pushed" and that the close ally may lose their own mind is a very common experience. A common descriptor used by both pwBPD and their loved ones is that the experience is like being on an emotional roller-coaster. Many of those who have experienced a relationship with a pwBPD describe a cycle of behaviour that, constitutes manipulative abuse (Brüne, 2016). The actions carried out by a pwBPD are unconsciously driven as they desperately fear abandonment whilst at the same time having high levels of chronic mistrust and a belief that they are unlovable, and a lack of object constancy with their loved ones (Matejko, 2022).

A typical cycle will consist of an initial and often incredible connection between the pwBPD and their loved one, their "favourite person" as described in the culture. The pwBPD will engage in "mirroring", elevating their loved one, affirming their beliefs, dreams, and activities, and will present themselves as exciting and adventurous in the process. For the loved one, they will often describe the experience in highly romantic terms, such as finally meeting their soulmate. This experience, however, does not last; the inevitable flaws of the loved one and the affective instability of the pwBPD will usually mean that the pwBPD will engage in "splitting" against their loved one. Where once they were exalted, they are now treated with equivalent disdain (often with rage and vitriol), and will soon be discarded. During the negative side of the split, the pwBPD, with far greater frequency than others, will establish a new love interest (Michael, 2021). "Splitting" itself is a malformed defense mechanism on the part of the person with BPD (Fetruck et al, 2018) where they convince themselves of the validity of the impending discard.

With the new love interest, the same process is very likely to repeat itself. Often enough, the pwBPD will then engage in the same process and re-establish connection with their original partner with a similar level of original elevation, and the cycle will repeat, or they will find an entirely new relationship; perhaps unsurprisingly, pwBPD tend to have a larger number of romantic relationships over their lifetime (Navarro-Gómez et al, 2017). Assuming a return to the original interest, it is not uncommon for loved ones of a pwBPD to describe how, over a number of years, their partners have discarded them several times and more. Whilst patience and commitment are admirable in any relationship, they will be insufficient in this situation. Therapy for pwBPD and couples therapy for the pwBPD and their partner is also required for success. Establishing clear boundaries and agreed consequences for particular actions should also assist.

Lying, Gaslighting, Lovebombing

The perception of reality for a pwBPD is driven by the current emotional state, which is subject to heightened levels of intensity and instability. As a direct consequence of this, a pwBPD engages in activities which, to an outsider, may seem like lying or gaslighting but are driven by these states rather than an act of conscious deception. For example, pwBPD often have a weakened level of promissory commitment to the expressions that they provide. Their reality is very much in the "here and now", rather than in the longer term, even when expressed in those terms. At the moment a pwBPD will quite sincerely and wholeheartedly believe what they are saying but will either forget the content entirely or have a radical change in affective orientation. Reminding the pwBPD of their prior commitments is important, but even more can be gained with a reminder of the emotional content of the commitment.

Another result of this emotional, rather than factual, perception is that pwBPD present statements that seem like gaslighting. Emotionally healthy people will develop feelings based on facts. However, pwBPD may unconsciously revise the facts to fit their current feelings or invent facts to fill in memory gaps. Tragically, this behaviour also weakens the ability of the pwBPD to develop a coherent autobiographical sense of self or firm memories. This can also be very confronting to an ally, whose immediate reaction will be to correct the factual error; this is a mistake and instead, the same SET principles described previously should be applied; the facts are secondary; empathy and understanding of the feeling must have priority.

Another experience that loved ones of a pwBPD experience is "lovebombing". This is overwhelming displays of affection and attachment. As the Oxford English Dictionary states: "the action or practice of lavishing someone with attention or affection, especially in order to influence or manipulate them". For a pwBPD the lavishing is real, at the moment. They are not consciously trying to manipulate their loved one. They are, in fact, both terrified of losing their loved one (thus the ovewhelming display of affection) and, at the same time, ready to engage in a "protective discard" on the assumption that their loved one will leave them. Love-bombing can be seen as a symptom of an insecure attachment style, that matches with 90%+ of pwBPD, to the point that is considered almost tautological (Kaurin et al, 2020), and the disorganized insecure attachment style in particular (Agrawal et al, 2004).

Concluding Remarks

This summary is a compilation of my own notes and research over the past two years or so. It really is a personal essay, albeit written with my own tendency to an academic style, to make sense of what is a common and often debilitating mental illness. Despite the various difficulties, emphasis is again placed on the importance of individual variation with pwBPD, the legitimacy of their voice in explaining the lived experience of the condition, and the fact that the person who has BPD is also so much more than the illness that they carry. There are a terrible stigma (Aviram, et al, 2006) attached to pwBPD in popular culture and the media, and there are prejudices that abound and most surprisingly in the professions that should be the most helpful. Of course, pwBPD are just as prone to engaging in consciously hurtful acts toward others as anyone else, but in the main, they are incredibly empathic and caring although often unable to fully control their impulses. Genuine sympathy, understanding, and treatment all will help make life much better for them and us.

References

Agrawal, H. R., Gunderson, J., Holmes, B. M., & Lyons-Ruth, K. (2004). Attachment studies with borderline patients: a review. Harvard review of psychiatry, 12(2), 94–104. https://doi.org/10.1080/10673220490447218

Aviram RB, Brodsky BS, Stanley B (2006). "Borderline personality disorder, stigma, and treatment implications". Harvard Review of Psychiatry. 14 (5): 249–256. doi:10.1080/10673220600975121

Biskin RS. (2015). The Lifetime Course of Borderline Personality Disorder. Can J Psychiatry. 2015 Jul;60(7):303-8. doi: 10.1177/070674371506000702. PMID: 26175388; PMCID: PMC4500179.

Brüne M. Borderline Personality Disorder: Why 'fast and furious'?. (2016) Evol Med Public Health;2016(1):52–66. doi:10.1093/emph/eow002

Caligor E, Levy KN, Yeomans FE. (2015). Narcissistic personality disorder: Diagnostic and clinical challenges. AJP. 2015;172(5):415-422. doi:10.1176/appi.ajp.2014.14060723

Castle, D. J. (2019). The complexities of the borderline patient: how much more complex when considering physical health?. Australasian Psychiatry, 27(6), 552-555.

Cattane N, Rossi R, Lanfredi M, Cattaneo A. (2017). Borderline personality disorder and childhood trauma: exploring the affected biological systems and mechanisms. BMC Psychiatry. 2017;17(1):221. doi:10.1186/s12888-017-1383-2

Choi-Kain LW, Finch EF, Masland SR, Jenkins JA, Unruh BT. (2017). What works in the treatment of borderline personality disorder. Curr Behav Neurosci Rep. 2017;4(1):21-30. doi:10.1007/s40473-017-0103-z

Distel et al. Heritability of borderline personality disorder features is similar across three countries. Psychological Medicine, 2008; 38 (9): DOI: 10.1017/S0033291707002024

Ellison WD, Rosenstein LK, Morgan TA, Zimmerman M. (2018). Community and Clinical Epidemiology of Borderline Personality Disorder. Psychiatr Clin North Am. 2018 Dec;41(4):561-573. doi: 10.1016/j.psc.2018.07.008. Epub 2018 Oct 16. PMID: 30447724.

Fertuck EA, Fischer S, Beeney J. (2018). Social cognition and borderline personality disorder: Splitting and trust impairment findings. Psychiatr Clin North Am. 2018;41(4):613-632 doi:10.1016/j.psc.2018.07.003

Frías Á, Palma C. Comorbidity between post-traumatic stress disorder and borderline personality disorder: A review. Psychopathology. 2015;48(1):1-10. doi:10.1159/000363145
https://pubmed.ncbi.nlm.nih.gov/25227722/

Grant BF, Chou SP, Goldstein RB, et al. Prevalence, correlates, disability, and comorbidity of DSM-IV borderline personality disorder: Results from the Wave 2 National Epidemiologic Survey on Alcohol and Related Conditions. J Clin Psychiatry. 2008;69(4):533-545. doi:10.4088/jcp.v69n0404

Grootens KP, Verkes RJ. (2005). "Emerging evidence for the use of atypical antipsychotics in borderline personality disorder". Pharmacopsychiatry. 38 (1): 20–3. doi:10.1055/s-2005-837767.

Gunderson JG, Zanarini MC, Choi-Kain LW, Mitchell KS, Jang KL, Hudson JI (August 2011). "Family Study of Borderline Personality Disorder and Its Sectors of Psychopathology". JAMA: The Journal of the American Medical Association. 68 (7): 753–762. doi:10.1001/archgenpsychiatry.2011.65.

Gupta R, Koscik TR, Bechara A, Tranel D. The amygdala and decision-making. (2011). Neuropsychologia. 2011 Mar;49(4):760-6. doi: 10.1016/j.neuropsychologia.2010.09.029. Epub 2010 Oct 8. PMID: 20920513; PMCID: PMC3032808.

Hammond, C. (rev) (2020). How PTSD can look like Borderline Personality Disorder. Psych Central.

Hörz-Sagstetter S, Diamond D, Clarkin JF, et al. (2018) Clinical characteristics of comorbid narcissistic personality disorder in patients with borderline personality disorder. J Pers Disord. 2018;32(4):562-575. doi:10.1521/pedi_2017_31_306

Jhaveri, D. (2018). Neurogenesis in the emotion-processing centre of the brain. Australasian Science, 39(1), 24-26.

Kaurin, A., Beeney, J. E., Stepp, S. D., Scott, L. N., Woods, W. C., Pilkonis, P. A., & Wright, A. G. C. (2020). Attachment and Borderline Personality Disorder: Differential Effects on Situational Socio-Affective Processes. Affective science, 1(3), 117–127. https://doi.org/10.1007/s42761-020-00017-7

Kreisman JJ. (2018). Talking to a Loved One with Borderline Personality Disorder, Communication Skills to Manage Intense Emotions, Set Boundaries, and Reduce Conflict. New Harbinger Publications.

Langevin JP. (2012). The amygdala as a target for behavior surgery. Surg Neurol Int. 2012;3(Suppl 1):S40-6. doi: 10.4103/2152-7806.91609. Epub 2012 Jan 14. PMID: 22826810; PMCID: PMC3400485.

Lieb, Klaus; Völlm, Birgit; Rücker, Gerta; Timmer, Antje; Stoffers, Jutta M. (2010). "Pharmacotherapy for borderline personality disorder: Cochrane systematic review of randomised trials". The British Journal of Psychiatry. 196 (1): 4–12. doi:10.1192/bjp.bp.108.062984

López-Villatoro, J. M., Diaz-Marsá, M., Mellor-Marsá, B., De la Vega, I., & Carrasco, J. L. (2020). Executive dysfunction associated with the primary psychopathic features of borderline personality disorder. Frontiers in Psychiatry, 11, 514905.

Matejko, S. (2022). Understanding Object Constancy in Borderline Personality Disorder and Narcissism, PsychCentral, 2022
https://psychcentral.com/disorders/borderline-personality-disorder/objec...

Michael J, Chennells M, Nolte T, et al (2021). Probing commitment in individuals with borderline personality disorder. J Psychiatric Res. 2021;137:335-341. doi:10.1016/j.jpsychires.2021.02.062

Nauert, R., (2017). Brain Scans Clarify Borderline Personality Disorder. PsychCentral
https://psychcentral.com/news/2017/09/04/brain-scans-clarify-borderline-...

Navarro-Gómez S, Frías Á, Palma C. Romantic relationships of people with borderline personality: A narrative review. (2017) PSP. 2017;50(3):175-187. doi:10.1159/000474950

Niedtfeld I. (2017) Experimental investigation of cognitive and affective empathy in borderline personality disorder: effects of ambiguity in multimodal social information processing. Psychiatry Res 253:58–63. doi: 10.1016/j.psychres.2017.03.037

O'Neill A, Frodl T (October 2012). "Brain structure and function in borderline personality disorder". Brain Structure & Function. 217 (4): 767–782. doi:10.1007/s00429-012-0379-4

Pagura, J., Stein, M. B., Bolton, J. M., Cox, B. J., Grant, B., & Sareen, J. (2010). Comorbidity of borderline personality disorder and posttraumatic stress disorder in the US population. Journal of psychiatric research, 44(16), 1190-1198.

Paris J. (2019). Suicidality in Borderline Personality Disorder. Medicina (Kaunas). 2019 May 28;55(6):223. doi: 10.3390/medicina55060223. PMID: 31142033; PMCID: PMC6632023.

Patel RS, Manikkara G, Chopra A. Bipolar Disorder and Comorbid Borderline Personality Disorder: Patient Characteristics and Outcomes in US Hospitals. (2019) Medicina (Kaunas). 2019 Jan 14;55(1):13. doi: 10.3390/medicina55010013. PMID: 30646620; PMCID: PMC6358827.

Peters JR, Geiger PJ. (2016). Borderline personality disorder and self-conscious affect: Too much shame but not enough guilt? Personal Disord. 2016 Jul;7(3):303-8. doi: 10.1037/per0000176. Epub 2016 Feb 11. PMID: 26866901; PMCID: PMC4929016.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4929016/

Ruggero CJ, Zimmerman M, Chelminski I, Young D. Borderline personality disorder and the misdiagnosis of bipolar disorder. (2010). J Psychiatr Res. 2010;44(6):405–408. doi:10.1016/j.jpsychires.2009.09.011

Salgado, R. M., Pedrosa, R., & Bastos-Leite, A. J. (2020). Dysfunction of Empathy and Related Processes in Borderline Personality Disorder: A Systematic Review. Harvard review of psychiatry, 28(4), 238–254. https://doi.org/10.1097/HRP.0000000000000260

Sansone RA, Sansone LA. Childhood trauma, borderline personality, and eating disorders: a developmental cascade. Eat Disord. 2007;15(4):333-46. doi:10.1080/10640260701454345

Selby E. A. (2013). Chronic sleep disturbances and borderline personality disorder symptoms. Journal of consulting and clinical psychology, 81(5), 941–947. https://doi.org/10.1037/a0033201

Skodol AE, Bender DS. (2003) Why are women diagnosed borderline more than men? Psychiatr Q. 2003 Winter;74(4):349-60. doi: 10.1023/a:1026087410516. PMID: 14686459.

Terburg D, Morgan BE, Montoya ER, Hooge IT, Thornton HB, Hariri AR, Panksepp J, Stein DJ, van Honk J. Hypervigilance for fear after basolateral amygdala damage in humans. (2012). Transl Psychiatry. 2012 May 15;2(5):e115. doi: 10.1038/tp.2012.46. PMID: 22832959; PMCID: PMC3365265.

Toda, T., Parylak, S.L., Linker, S.B. et al. (2019). The role of adult hippocampal neurogenesis in brain health and disease. Mol Psychiatry 24, 67–87. https://doi.org/10.1038/s41380-018-0036-2

Torgersen S, Lygren S, Oien PA, Skre I, Onstad S, Edvardsen J, Tambs K, Kringlen E (2000). "A twin study of personality disorders". Comprehensive Psychiatry. 41 (6): 416–425. doi:10.1053/comp.2000.16560

Weiner L, Perroud N, Weibel S. (2019). Attention Deficit Hyperactivity Disorder And Borderline Personality Disorder In Adults: A Review Of Their Links And Risks. Neuropsychiatr Dis Treat. 2019 Nov 8;15:3115-3129. doi: 10.2147/NDT.S192871. PMID: 31806978; PMCID: PMC6850677.

Yang Y, Wang JZ. From Structure to Behavior in Basolateral Amygdala-Hippocampus Circuits. (2017). Front Neural Circuits. 2017 Oct 31;11:86. doi: 10.3389/fncir.2017.00086. PMID: 29163066; PMCID: PMC5671506.

Zanarini MC, Reichman CA, Frankenburg FR, Reich DB, Fitzmaurice G. (2010) The course of eating disorders in patients with borderline personality disorder: a 10-year follow-up study. Int J Eat Disord. 2010;43(3):226-32. doi:10.1002/eat.20689

Zanarini MC, Frankenburg FR, Reich DB, et al. (2012) Attainment and stability of sustained symptomatic remission and recovery among patients with borderline personality disorder and Axis II comparison subjects: a 16-year prospective follow-up study. Am J Psychiatry. 2012;169(5):476–483.

Zimmerman, M., (2019). Borderpolar: Patients with Borderline Personality Disorder and Bipolar Disorder. Psychiatric Times, Psychiatric Times Vol 36, Issue 12, Volume 36, Issue 12

,

Simon LyallAudiobooks – September 2022

Washington: A life by Ron Chernow

Very well written single-volume biography of the president. Covers his whole life in detail without being boring. Strong recommend. 5/5

The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous by Joseph Henrich

How the “normal” psychology of western individuals differs from other societies and how it got that way. Interesting ideas and a good read. 3/5

Dress Codes: How the Laws of Fashion Made History by Richard Thompson Ford

A mix of fashion orientated and enforced dress codes. I found the pre-1900 stuff more interesting than then later US-centric stories. 3/5

FDR by Jean Edward Smith

Biography of President Franklin D. Roosevelt. Extensive but not comprehensive, so some gaps where I wanted more. Would recommend though. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Tim SerongTANSTAAFL

It’s been a little over a year since our Redflow ZCell battery and Victron Energy inverter/charger kit were installed on our existing 5.94kW solar array. Now that we’re past the Southern Hemisphere spring equinox it seems like an opportune time to review the numbers and try to see exactly how the system has performed over its first full year. For background information on what all the pieces are and what they do, see my earlier post, Go With The Flow.

As we look at the figures for the year, it’s worth keeping in mind what we’re using the battery for, and how we’re doing it. Naturally we’re using it to store PV generated electricity for later use when the sun’s not shining. We are also charging the battery from the grid at certain times so it can be drawn down if necessary during peak times, for example I set up a small overnight charge to ensure there was power for the weekday morning peak, when the sun isn’t really happening yet, but grid power is more than twice as expensive. More recently in the winter months, I experimented with keeping the battery full with scheduled charges during most non-peak times. This involved quite a bit more grid charging, but got us through a couple of three hour grid outages without a hitch during some severe weather in August.

I spent some time going through data from the VRM portal for the last year, and correlating that with current bills from Aurora energy, and then I tried to compare our last year of usage with a battery, to the previous three years of usage without a battery. For reasons that will become apparent later, this turned out to be a massive pain in the ass, so I’m going to start by looking only at what we can see in the VRM portal for the past year.

The VRM portal has three summary views: System Overview, Consumption and Solar. System Overview tells us overall how much total power was pulled from the grid, how much was exported to the grid, how much was produced locally, and how much was consumed by our loads. The Consumption view (which I wish they’d named “Loads”, because I think that would be clearer) gives us the same consumption figure, but tells us how much of that came from the grid, vs. what came from the battery vs. what came from solar. The Solar view tells us how much PV generation went to the grid, how much went to the battery, and how much was used directly. There is some overlap in the figures from these three views, but there are also some interesting discrepancies, notably: the “From Grid” and “To Grid” figures shown under System Overview are higher than what’s shown in the Consumption and Solar views. But, let’s start by looking at the Consumption and Solar views, because those tell us what the system gives us, and what we’re using. I’ll come back after that to the System Overview, which is where things start to get weird and we discover what the system costs to run.

The VRM portal lets you chose any date range you like to get historical figures and bar charts. It also gives you pie charts of the last 24 hours, 7 days, 30 days and 365 days. To make the figures and bar charts match the pie charts, the year we’re analysing starts at 4pm on September 25, 2021 and ends at 4pm on September 25, 2022, because that’s exactly when I took the following screenshots. This means we get a partial September at each end of the bar chart. I’m sorry about that.

Here’s the Consumption view:

Consumption view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

This shows us that in the last 12 months, our loads consumed 10,849kWh of electricity. Of that, 54% (5,848kWh) came from the grid, 23% (2,506kWh) came direct from solar PV and the final 23% (2,494kWh) came from the battery.

From the rough curve of the bar chart we can see that our consumption is lower in the summer months and higher in the winter months. I can’t say for certain, but I have to assume that’s largely due to heating. The low in February was 638kWh (an average of 22.8kWh/day). The high in July was 1,118kWh (average 36kWh/day).

Now let’s look at the Solar view:

Solar view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

In that same time period we generated 5,640kWh with our solar array, of which 44% (2,506kWh) was used directly by our loads, 43% (2,418kWh) went into the battery and 13% (716kWh) was exported to the grid.

Unsurprisingly our generation is significantly higher in summer than in winter. We got 956kWh (average 30kWh/day) in December but only 161kWh (5.3kWh/day) in June. Peak summer figures like that mean we’ll theoretically be able to do without grid power at all during that period once we get a second ZCell (note that we’re still exporting to the grid in December – that’s because we’ve got more generation capacity than storage). The winter figures clearly indicate that there’s no way we can provide anywhere near all our own power at that time of year with our current generation capacity and loads.

Now look closely at the summer months (December, January and February). There should be a nice curve evident there from December to March, but instead January and February form a weird dip. This is because we were without solar generation for three weeks from January 30 – February 11 due to replacing a faulty MPPT. Based on figures from previous years, I suspect we lost 500-600kWh of potential generation in that period.

Another interesting thing is that if we compare “To Battery” on the Solar view (2,418kWh) with “From Battery” on the Consumption view (2,494kWh), we see that our loads consumed 76kWh more from the battery than we actually put into it with solar generation. This discrepancy is due to the fact that in addition to charging the battery from solar, we’ve also been charging it from the grid at certain times, but the amount of power sent to the battery from the grid isn’t broken out explicitly anywhere in the VRM portal.

Now let’s look at the System Overview:

System Overview view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

Here we see the same figures for “Production” (5,640kWh) and “Consumption” (10,849kWh) as were in the Consumption and Solar views, and the bar chart shows the same consumption and generation curves (ignore the blue overlay and line which indicate battery minimum/maximum and average state of charge – that information is largely meaningless at this scale, given we cycle the battery completely every day).

Now look at “To Grid” and “From Grid”. “To Grid” is 754 kWh, i.e. we somehow sent 38kWh more to the grid than came from solar. “From Grid”, at 8,531kWh, is a whopping 2,683kWh more than the 5,848kWh grid power consumed by our loads (i.e. close to half as much again).

So, what’s going on here?

One factor is that we’re charging the battery from the grid at certain times. Initially that was a few hours overnight and a few hours in the afternoon on weekdays, although the afternoon charge is obviously also provided by the solar if the sun is shining. For all of July, August and most of September though I was using a charge schedule to keep the battery full except for peak times and maintenance cycle nights, which meant quite a bit more grid charging overnight than earlier in the year, as well as grid charging most of the day during days with no or minimal sunshine. Grid power sent to the battery isn’t visible in the “From Grid” figure on the Consumption view – that view shows only our loads, i.e. the equipment the system is powering – but it is part of the “From Grid” figure in the System Overview.

Similarly, some of the power we export to the grid is actually exported from the battery, as opposed to being exported from solar generation. That usually only happens during maintenance cycles when our loads aren’t enough to draw the battery down at the desired discharge rate. But again, same thing, that figure is present here on the system overview page as part of “To Grid”, but of course is not part of the “To Grid” figure on the Solar view.

Another factor is that the system itself needs some amount of power to operate. The Victron kit (the MultiPlus II Inverter/Chargers, the Cerbo GX, the MPPT) use some small amount of power themselves. The ZCell battery also requires power to operate its pumps and fans. When the sun is out this power can of course come from solar. When solar power is not available, power to run the system needs to come from some combination of the remaining charge in the battery, and the grid.

On that note, I did a little experiment to see how much power the system uses just to operate. On July 9 (which happened to be a maintenance cycle day), I disabled all scheduled battery charges, and I shut off the DC isolators for the solar PV, so the battery would remain online (pumps and fans running) but empty for all of July 10. The following day I went and checked the figures on the System Overview, which showed we drew 35kWh, but that our consumption was 33kWh. So, together, the battery doing nothing other than running its pumps and fans, plus the Multis doing nothing other than passing grid power through, used 2kWh of power in 24 hours. Over a year, that’s 730kWh. As mentioned above, ordinarily some of that will be sourced from mains and some from solar, but if we look at the total power that came into the system as a whole (5,640kWh from solar + 8,531kWh from the grid = 14,171kWh), 730kWh is just slightly over 5% of that.

The final factor in play is that a certain amount of power is naturally lost due to conversion at various points. The ZCell has a maximum 80% DC-DC stack efficiency, meaning in the absolute best case if you want to get 10kW out of it, you have to put 12.5kW in. In reality you’ll never hit the best case: the lifetime charge and discharge figures the BMS currenly shows for our ZCell are 4,423 and 3,336kWh respectively, which is a bit over 75%. The Multis have a maximum efficiency of 96% when doing their invert/charge dance, so if we grid charge the battery, we lose at least 4% on the way in, and at least 4% on the way out as well, going to and from AC/DC. Again, in reality that loss will be higher than 4% each way, because 96% is the maximum efficiency.

A bunch of the stuff above just doesn’t apply to the previous system with the ABB inverter and no battery. I also don’t have anything like as much detailed data to go on for the old system, which makes comparing performance with the new system fiendishly difficult. The best comparison I’ve been able to come up with so far involves looking at total power input to the system (power from grid plus solar generation), total consumption by loads (i.e. actual locally usable power), and total power exported.

Prior to the Victron gear and Redflow battery installation, I had grid import and export figures from my Aurora Energy bills, and I had total generation figures from the ABB inverter. From this I can synthesise what are hopefully reasonably accurate load consumption figures by adding adding grid input to total PV generation minus grid export.

I had hoped to do this analysis on a quarterly basis to line up with Aurora bills, because then I would also be able to see how seasonal solar generation and usage went up and down. Unfortunately the billing for 2020 and 2021 was totally screwed up by the COVID-19 pandemic, because there were two quarters during which nobody was coming out to read the electricity meter. The bills for those quarters stated estimated usage (i.e. were wrong, especially given they estimated grid export as zero), with subsequent quarters correcting the figures. I have no way to reliably correlate that mess with my PV generation figures, except on an annual basis. Also, using billing periods from pre-battery years, the closest I can get to the September 25 based 2021-2022 year I’m looking at now is billing periods starting and ending in mid-August. But, that’s close enough. We’ve still got four pretty much back-to-back 12 month periods to look at.

YearGrid InSolar InTotal InLoadsExport
2018-20199,0316,68215,71311,8273,886
2019-20209,3246,46815,79212,2553,537
2020-20217,5826,34713,92910,3583,571
2021-20228,5315,64014,17110,849754

One thing of note here is that in the 2018-2019 and 2019-2020 years, our annual consumption was pretty close to 12MWh, whereas in 2020-2021 and 2021-2022 it was closer to 10.5MWh. If I had to guess, I’d say that ~1.5MWh/year drop is due to a couple of pieces of computer equipment that were previously always on, now mostly running in standby mode except when actually needed. A couple of hundred watts constant draw is a fair whack of power over the course of a year. Another thing to note is the big drop in power exported in 2021-2022, because most of our solar generation is now used locally.

The thing that freaked me out when looking at these figures is that in the battery year, while our loads consumed 491kWh more than in the previous non-battery year, we pulled 949kWh more power in from the grid! This is the opposite of what I had expected to see, especially having previously written:

In the eight months the system has been running we’ve generated 4631kWh of electricity and “only” sent 588kWh to the grid, which means we’ve used 87% of what we generated locally – much better than the pre-battery figure of 45%. I suspect we’ve reduced the amount of power we pull from the grid by about 30% too, but I’ll have to wait until we have a full year’s worth of data to be sure.

– by me at the end of Go With The Flow

When I wrote that, I was looking at August 31, 2021 through April 27, 2022, and comparing that to the August 2020 to May 2021 grid power figures from my old Aurora bills. The mistake I must have made back then was to look at “From Grid” on the Consumption view, rather than “From Grid” on the System Overview. I’ve just done this exercise again, and the total grid draw from our Aurora bills from August 2020 to May 2021 is 4,980kWh. “From Grid” on the Consumption view for August 2021 to May 2022 is 3,575kWh, which is about 30% less, but “From Grid” on the System Overview is 4,754kWh, which is only about 5% less. So our loads pulled about 30% less from the grid than the same time the year before, but our system as a whole didn’t.

Now let’s break our ridiculous September-based year down further into months, to see if we can see more detail. I’ve highlighted some interesting periods in bold.

MonthGrid InSolar InTotal InLoadsExport
Sep 21 (part)1531012542136
Oct 216366291,26598855
Nov 214307471,17786697
Dec 212329561,188767176
Jan 226524501,10282274
Feb 2247043090063883
Mar 224985681,06681364
Apr 2260937798677527
May 229102381,1489533
Jun 221,1141611,27510732
Jul 221,1632231,386111811
Aug 229103751,28596664
Sep 22 (part)7543851,13985792
Total8,5315,64014,17110,849754

December is great. We generated about 25% more power than our loads use (956/767=1.25), and our grid input was only about 30% of the total of our loads (232/767=0.30).

January and February show the effects of missing three weeks of potential generation. I mean, just look at December through February 2021-2022 versus the previous three summers.

PV Generation December through January 2018-2022
 2018-20192019-20202020-20212021-2022
December919882767956
January936797818450
February699656711430

June and July are terrible. They’re our highest load months, with the lowest solar generation and we pulled 3-4% more power from the grid than our loads actually consumed. I’m going to attribute the latter largely to grid charging the battery.

If I dig a couple of interesting figures out for June and July I see “To Battery” on the Solar view shows 205kWh, and “From Battery” on the Consumption view shows 558kWh. Total consumption in that period was 2,191kWh, with the total “From Grid” reported in System Overview of 2,277kWh. Let’s mess with that a bit.

Bearing in mind the efficiency numbers mentioned earlier, if 205kWh went to the battery from PV, that means no more than 154kWh of what we got out of the battery was from PV generation (remember: real world DC-DC stack efficiency of about 75%). The remaining 404kWh out of the battery is power that went into it from the grid. And that means at least 538kWh in (404/0.75). Note that total from grid for these two months was 86kWh more than the 2,191kWh used by our loads. If I hadn’t been keeping the battery topped up from the grid, I’d’ve saved at least 134kWh of grid power, which would have brought our grid input figure back down below our consumption figure. Note also that this number will actually be higher in reality because I haven’t factored in AC/DC conversion losses from the Multis.

Now let’s look at some costs. When I started trying to compare the new system to the previous system, I went in thinking to look at in in terms of total power input to the system, total consumption by loads, and total power exported. There’s one piece missing there, so let’s add another couple of columns to an earlier table:

YearGrid InSolar InTotal InLoadsExportTotal Outwhat?
2021-20228,5315,64014,17110,84975411,6032,568

The total usable output of the system was 11,603kWh for 14,171kWh input. The difference between these two figures – 2,568kWh, or about 18% – went somewhere else. Per my earlier experiment, 5% is power that went to actually operate the system components, including the battery. That means about 13% of the power input to the system over the course of the year must have gone to some combination of charge/discharge and AC/DC conversion (in)efficiencies. We can consider this the energy cost of the system. To have the ability to time-shift expensive peak grid electricity, and to run the house without the grid if the sun is out, or from the battery when it has charge, costs us 18% of the total available energy input.

Grid power has energy costs too, but we’re not usually aware of this because it happens somewhere else. I haven’t yet found Tasmanian figures, but this 2021 Transmission Annual Planning Report PDF from Powerlink in Queensland has historical figures showing that about 7% of generation there went to auxiliaries, i.e. fans and pumps and things running at the power stations. And according to the Australian Energy Market Operator (AEMO), 10% of grid power generated is lost during transmission and distribution. Stanwell (a power company in Queensland) have a neat explainer of all this on their What’s Watt site.

Finally, speaking of expensive grid electricity, let’s look at how much we paid Aurora Energy over the past four years for our power. The bills are broken out into different tariffs, for which you’re charged different amounts per kilowatt hour and then there’s an additional daily supply charge, and also credits for power exported. We can simplify that by just taking the total dollar value of all the power bills and dividing that by the total power drawn from the grid to arrive at an effective cost per kilowatt hour for the entire year. Here it is:

YearFrom GridTotal BillCost/kWh
2018-20199,031$2,278.33$0.25
2019-20209,324$2,384.79$0.26
2020-20217,582$1,921.77$0.25
2021-20228,531$1,731.40$0.20

So, the combination of the battery plus the switch from Flat Rate to Peak & Off-Peak billing has reduced the cost of our grid power by about 20%. I call that a win.

Going forwards it will be interesting to see how the next twelve months go, and, in particular, what we can do to reduce our power consumption. A significant portion of our power is used by a bunch of always-on computer equipment. Some of that I need for my work, and some of that provides internet access, file storage and email for us personally. Altogether, according to the UPSes, this kit pulls 200-250 watts continuously, but will pull more than that during the day when it’s being used interactively. If we call it 250W continuous, that’s a minimum of 6kWh/day, which is 2,190kWh/year, or about 20% of the 2021-2022 consumption. Some of that equipment should be replaced with newer, more power efficient kit. Some of it could possibly even be turned off or put into standby mode some of the time.

We still need to get a heat pump to replace the 2400W panel heater in our bedroom. That should save a huge amount of power in winter. We’re also slowly working our way through the house installing excellent double glazed windows from Elite Double Glazing, which will save on power for heating and cooling year round.

And of course, we still need to get that second ZCell.

,

Francois MarierUpgrading from chan_sip to res_pjsip in Asterisk 18

After upgrading to Ubuntu Jammy and Asterisk 18.10, I saw the following messages in my logs:

WARNING[360166]: loader.c:2487 in load_modules: Module 'chan_sip' has been loaded but was deprecated in Asterisk version 17 and will be removed in Asterisk version 21.
WARNING[360174]: chan_sip.c:35468 in deprecation_notice: chan_sip has no official maintainer and is deprecated.  Migration to
WARNING[360174]: chan_sip.c:35469 in deprecation_notice: chan_pjsip is recommended.  See guides at the Asterisk Wiki:
WARNING[360174]: chan_sip.c:35470 in deprecation_notice: https://wiki.asterisk.org/wiki/display/AST/Migrating+from+chan_sip+to+res_pjsip
WARNING[360174]: chan_sip.c:35471 in deprecation_notice: https://wiki.asterisk.org/wiki/display/AST/Configuring+res_pjsip

and so I decided it was time to stop postponing the overdue migration of my working setup from chan_sip to res_pjsip.

It turns out that it was not as painful as I expected, though the conversion script bundled with Asterisk didn't work for me out of the box.

Debugging

Before you start, one very important thing to note is that the SIP debug information you used to see when running this in the asterisk console (asterisk -r):

sip set debug on

now lives behind this command:

pjsip set logger on

SIP phones

The first thing I migrated was the config for my two SIP phones (Snom 300 and Snom D715).

The original config for them in sip.conf was:

[2000]
; Snom 300
type=friend
qualify=yes
secret=password123
encryption=no
context=full
host=dynamic
nat=no
directmedia=no
mailbox=10@internal
vmexten=707
dtmfmode=rfc2833
call-limit=2
disallow=all
allow=g722
allow=ulaw

[2001]
; Snom D715
type=friend
qualify=yes
secret=password456
encryption=no
context=full
host=dynamic
nat=no
directmedia=yes
mailbox=10@internal
vmexten=707
dtmfmode=rfc2833
call-limit=2
disallow=all
allow=g722
allow=ulaw

and that became the following in pjsip.conf:

[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0
external_media_address = myasterisk.dyn.example.com
external_signaling_address = myasterisk.dyn.example.com
local_net = 192.168.0.0/255.255.0.0

[2000]
type = aor
max_contacts = 1

[2000]
type = auth
username = 2000
password = password123

[2000]
type = endpoint
context = full
dtmf_mode = rfc4733
disallow = all
allow = g722
allow = ulaw
direct_media = no
mailboxes = 10@internal
auth = 2000
outbound_auth = 2000
aors = 2000

[2001]
type = aor
max_contacts = 1

[2001]
type = auth
username = 2001
password = password456

[2001]
type = endpoint
context = full
dtmf_mode = rfc4733
disallow = all
allow = g722
allow = ulaw
direct_media = yes
mailboxes = 10@internal
auth = 2001
outbound_auth = 2001
aors = 2001

The different direct_media line between the two phones has to do with how they each connect to my Asterisk server and whether or not they have access to the Internet.

Internal calls

For some reason, my internal calls (from one SIP phone to the other) didn't work when using "aliases". I fixed it by changing this blurb in extensions.conf from:

[speeddial]
exten => 1000,1,Dial(SIP/2000,20)
exten => 1001,1,Dial(SIP/2001,20)

to:

[speeddial]
exten => 1000,1,Dial(${PJSIP_DIAL_CONTACTS(2000)},20)
exten => 1001,1,Dial(${PJSIP_DIAL_CONTACTS(2001)},20)

I have not yet dug into what this changes or why it's necessary and so feel free to leave a comment if you know more here.

PSTN trunk

Once I had the internal phones working, I moved to making and receiving phone calls over the PSTN, for which I use VoIP.ms with encryption.

I had to change the following in my sip.conf:

[general]
register => tls://555123_myasterisk:password789@vancouver2.voip.ms
externhost=myasterisk.dyn.example.com
localnet=192.168.0.0/255.255.0.0
tcpenable=yes
tlsenable=yes
tlscertfile=/etc/asterisk/asterisk.cert
tlsprivatekey=/etc/asterisk/asterisk.key
tlscapath=/etc/ssl/certs/

[voipms]
type=peer
host=vancouver2.voip.ms
secret=password789
defaultuser=555123_myasterisk
context=from-voipms
disallow=all
allow=ulaw
allow=g729
insecure=port,invite
canreinvite=no
trustrpid=yes
sendrpid=yes
transport=tls
encryption=yes

to the following in pjsip.conf:

[transport-tls]
type = transport
protocol = tls
bind = 0.0.0.0
external_media_address = myasterisk.dyn.example.com
external_signaling_address = myasterisk.dyn.example.com
local_net = 192.168.0.0/255.255.0.0
cert_file = /etc/asterisk/asterisk.cert
priv_key_file = /etc/asterisk/asterisk.key
ca_list_path = /etc/ssl/certs/
method = tlsv1_2

[voipms]
type = registration
transport = transport-tls
outbound_auth = voipms
client_uri = sip:555123_myasterisk@vancouver2.voip.ms
server_uri = sip:vancouver2.voip.ms

[voipms]
type = auth
password = password789
username = 555123_myasterisk

[voipms]
type = aor
contact = sip:555123_myasterisk@vancouver2.voip.ms

[voipms]
type = identify
endpoint = voipms
match = vancouver2.voip.ms

[voipms]
type = endpoint
context = from-voipms
disallow = all
allow = ulaw
allow = g729
from_user = 555123_myasterisk
trust_id_inbound = yes
media_encryption = sdes
auth = voipms
outbound_auth = voipms
aors = voipms
rtp_symmetric = yes
rewrite_contact = yes
send_rpid = yes
timers = no

The TLS method line is needed since the default in Debian OpenSSL is too strict. The timers line is to prevent outbound calls from getting dropped after 15 minutes.

Finally, I changed the Dial() lines in these extensions.conf blurbs from:

[from-voipms]
exten => 5551231000,1,Goto(2000,1)
exten => 2000,1,Dial(SIP/2000&SIP/2001,20)
exten => 2000,n,Goto(in2000-${DIALSTATUS},1)
exten => 2000,n,Hangup
exten => in2000-BUSY,1,VoiceMail(10@internal,su)
exten => in2000-BUSY,n,Hangup
exten => in2000-CONGESTION,1,VoiceMail(10@internal,su)
exten => in2000-CONGESTION,n,Hangup
exten => in2000-CHANUNAVAIL,1,VoiceMail(10@internal,su)
exten => in2000-CHANUNAVAIL,n,Hangup
exten => in2000-NOANSWER,1,VoiceMail(10@internal,su)
exten => in2000-NOANSWER,n,Hangup
exten => _in2000-.,1,Hangup(16)

[pstn-voipms]
exten => _1NXXNXXXXXX,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _1NXXNXXXXXX,n,Dial(SIP/voipms/${EXTEN})
exten => _1NXXNXXXXXX,n,Hangup()
exten => _NXXNXXXXXX,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _NXXNXXXXXX,n,Dial(SIP/voipms/1${EXTEN})
exten => _NXXNXXXXXX,n,Hangup()
exten => _011X.,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _011X.,n,Authenticate(1234)
exten => _011X.,n,Dial(SIP/voipms/${EXTEN})
exten => _011X.,n,Hangup()
exten => _00X.,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _00X.,n,Authenticate(1234)
exten => _00X.,n,Dial(SIP/voipms/${EXTEN})
exten => _00X.,n,Hangup()

to:

[from-voipms]
exten => 5551231000,1,Goto(2000,1)
exten => 2000,1,Dial(PJSIP/2000&PJSIP/2001,20)
exten => 2000,n,Goto(in2000-${DIALSTATUS},1)
exten => 2000,n,Hangup
exten => in2000-BUSY,1,VoiceMail(10@internal,su)
exten => in2000-BUSY,n,Hangup
exten => in2000-CONGESTION,1,VoiceMail(10@internal,su)
exten => in2000-CONGESTION,n,Hangup
exten => in2000-CHANUNAVAIL,1,VoiceMail(10@internal,su)
exten => in2000-CHANUNAVAIL,n,Hangup
exten => in2000-NOANSWER,1,VoiceMail(10@internal,su)
exten => in2000-NOANSWER,n,Hangup
exten => _in2000-.,1,Hangup(16)

[pstn-voipms]
exten => _1NXXNXXXXXX,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _1NXXNXXXXXX,n,Dial(PJSIP/${EXTEN}@voipms)
exten => _1NXXNXXXXXX,n,Hangup()
exten => _NXXNXXXXXX,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _NXXNXXXXXX,n,Dial(PJSIP/1${EXTEN}@voipms)
exten => _NXXNXXXXXX,n,Hangup()
exten => _011X.,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _011X.,n,Authenticate(1234)
exten => _011X.,n,Dial(PJSIP/${EXTEN}@voipms)
exten => _011X.,n,Hangup()
exten => _00X.,1,Set(CALLERID(all)=Francois Marier <5551231000>)
exten => _00X.,n,Authenticate(1234)
exten => _00X.,n,Dial(PJSIP/${EXTEN}@voipms)
exten => _00X.,n,Hangup()

Note that it's not just replacing SIP/ with PJSIP/, but it was also necessary to use a format supported by pjsip for the channel since SIP/trunkname/extension isn't supported by pjsip.

Lev LafayetteCompiling Your Python

It still generates a little bit of surprise to discover that there are people who use Python on a daily basis that are apparently quite unfamiliar with compiling said code. Or perhaps not; it is, after all, the world's most popular programming language, it has a syntax that's cleaner than many older languages, it has an enormous collection of extensions, and so forth. As a result, there are many people who use Python, but perhaps not so many who have the inquisitiveness and courage, to dive a little deeper. This short article is a deeper dive to understand a little more about the language.

The first common (novice) mistake is that Python is an interpreted language and can't be compiled. It most certainly can and is "compiled", but not in the same way that a compiled language (e.g., C/C++, Fortran, Pascal) is. If this sounds confusing one needs to dig a little into the architecture.

A Python program is compiled before being interpreted, but this step is hidden at the surface level. When Python is executed it generates byte code. This byte code is transformed and interpreted by the Python Virtual Machine which then converts the byte code to binary machine code that the computer processor can output.

A top-level Python source file can be compiled in the following manner:


$ python -m py_compile helloworld.py

Or, with multiple files:


$ python -m py_compile helloworld.py hellosystem.py hellogalaxy.py ...

Running this will generate a __pycache__ that will contain the byte-code (e.g., helloworld.cpython-38.pyc) version of the program. This binary byte-code can be directly invoked (e.g., python3 helloworld.cpython-38.pyc).

The distinction between byte code interpreted by the PVM and machine code is very important, as sometimes people believe (armed with this new information) that because Python can be compiled that this will lead to much faster execution. This is incorrect. A compiled Python script does not run any faster than a non-compiled script. So why would one want to do this?

The short answer is that a program doesn't just run, it must also load its environment and including any other programs, modules, packages, etc that have been imported. If the runtime of a program is quite long then the advantage of compiling a Python program is relatively less, and conversely, a Python program with a short runtime or one that imports a number of additional programs is going to have a more significant advantage. In any case, there will be some performance improvement because all Python programs will be converted to byte code and interpreted by the PVM anyway. The example helloworld.py has an improvement of over 300% in real time, for example. This is not bound by simplicity or length, by the way. For example, a complex mathematical Python program can take a very long time to load, but runtime execution is very quick. Testing, as always, is the best solution.

There are two other advantages to using compiled Python. The first is that the byte code provides some protection against unwanted changes in a shared environment, for example, a careless contributor who in their passion to fix what looks like a bug makes a modification to the source code and breaks the program. Of course, a version control system should be used but even then an additional layer of protection is often worthwhile. The second advantage is that compilation often will result in a significantly smaller file and when coupled with the faster load times, this is particularly useful for websites, embedded environments, and especially when using Python for network programming. Seriously, in such an environment when it's waiting for a trigger it is critical for a fast response; this can be achieved by having the modules like socket, stmplib, base64, etc already loaded.

All silver linings are attached to clouds, however, and compiling Python is no different. The byte code is designed for the particular system architecture it was compiled in and therefore is not exactly transportable to a new system: "different machine, different Python", as the saying goes. Thus distribution will, at the very least, require distributing all the relevant .py files for implementation.

There are, being computing, many further elaborations that one could engage in. For example, I have not discussed the deeper optimisation options for compilation and the pitfalls that could result. Nor have I raised the details of the interpreter and the evaluation loop, or the distinction between implementations at that level (e.g., CPython, Jython, IronPython, PyPy, etc). Finally, perhaps in a simpler direction, I have not discussed the use of the compile() function which converts s specified source as a code object to be executed, such as by eval() or exec(). These will be discussed at another time. In the meantime, if performance matters, you probably should compile your Python code.

,

Tim RileyOpen source status update, August 2022

August’s OSS work landed one of the last big Hanami features, saw another Hanami release out the door, began some thinking about memory usage, and kicked off a fun little personal initiative. Let’s dive in!

Conditional slice loading in Hanami

At the beginning of the month I merged support for conditional slice loading in Hanami. I’d wanted this feature for a long time, and in fact I’d hacked in workarounds to achieve the same more than 2 years ago, so I was very pleased to finally get this done, and for the implementation work to be as smooth as it was.

The feature provides a new config.slices setting on your app class, which you can configure like so:

module MyApp
  class App < Hanami::App
    config.slices = %w[admin]
  end
end

For an app consisting of both Admin and Main slices and for the config above, when the app is booted, only the Admin slice will be loaded:

require "hanami/prepare"

Hanami.app.slices.keys # => [:admin]

Admin::Slice # exists, as expected
Main         # raises NameError, since it was never loaded

As we see from Main above, slices absent from this list will not have their namespace defined, nor their slice class loaded, nor any of their Ruby source files. Within that Ruby process, they effectively do not exist.

Specifying slices to load can be very helpful to improve boot time and minimize memory usage for specific deployed workloads of your app.

Imagine you have a subset of background jobs that run via a dedicated job runner, but whose logic is otherwise unneeded for the rest of your app to function. In this case, you could organize those jobs into their own slice, and then load only that slice for the job runner’s process. This arrangement would see the job runner boot as quickly as possible (no extraneous code to load) as well as save all the memory otherwise needed by all those classes. You could also do the invserse for your main deployed process: specify all slices except this jobs slice, and you gain savings there too.

Organising code into slices to promote operational efficiency like this also gives you the benefit of greater clarity in the separation of responsibilities between those slices: when a single slice of code is loaded and the rest of your app is made to disappear, that will quickly surface any insidious dependencies from that slice to the rest of your code (they’ll be raised as exceptions!). Cleaning these up will help ensure your slices remain useful as abstractions for reasoning about and maintaining your app.

To make it easy to tune the list of slices to load, I also introduced a new HANAMI_SLICES env var that sets this config without you having to write code inside your app class. In this way, you could use them in your Procfile or other similar deployment code:

web: HANAMI_SLICES=main,admin bundle exec puma -C config/puma.rb
feed_worker: HANAMI_SLICES=feed bundle exec rake jobs:work

This effort was also another example of why I’m so happy to be working alongside the Hanami core team. After initially proposing a more complex arrangement including separate lists for including or excluding slices, Luca jumped in and help me dial this back to the much simpler arrangement of the single list only. For an Hanami release in which we’re going to be introducing so many new ideas, the more we can keep simple around them, the better, and I’m glad to have people who can remind me of this.

Fixed how slice config is applied to component classes

Our action and view integration code relies on their classes detecting when they’re defined inside a slice’s namespace, then applying relevant config from the slice to their own class-level config object. It turned out our code for doing this broke a little when we adjusted our default class hierarchies. Thanks to some of our wonderful early adopters, we picked this up quickly and I fixed it. Now things just work like you expect however you choose to configure your action classes, whether through the app-level config.actions object, or by directly updating config in a base action class.

In doing this work, I became convinced we need an API on dry-configurable to determine whether any config value has been assigned or mutated by the user, since it would help so much in reliably detecting whether or not we should ignore config values at particular levels. For now, we could work around it, but I hope to bring this to dry-configurable at some point in the future.

Released Hanami 2.0.0.beta2

Another month passed, so it was time for another release! With my European colleagues mostly enjoying some breaks over their summer, I hunkered down in chilly Canberra and took care of the 2.0.0.beta2 release. Along with the improvements above, this release also included slice and action generators (hanami generate slice and hanami generate action, thank you Luca!), plus a very handle CLI middlewares inspector (thank you Marc!):

$ hanami middlewares

/    Dry::Monitor::Rack::Middleware (instance)
/    Rack::Session::Cookie

The list of things to do over the beta phase is getting smaller. I don’t expect we’ll need too many more of these releases!

Created memory usage benchmarks for dry-configurable

As the final 2.0 release gets closer, we’ve been doing various performance tests just to make sure the house is in order. One thing we discovered is that Hanami::Action is not as memory efficient as we’d like it to be. One of the biggest opportunities to improve this looked to be in dry-configurable, since that’s what is used to manage the per-class action configuration.

I suspected any effort here would turn out to be involved (and no surprise, it turned out to be involved 😆), so I thought it would be useful as a first step to establish a memory benchmark to revisit over the course of any work. This was also a great way to get my head in this space, which turned out to take over most of my September (but more on that next month).

Quietly relaunched Decaf Sucks

Decaf Sucks was once a thriving little independent online café review community, with its own web site (starting from humble beginnings as a Rails Rumble entry in 2009) and even native iOS app (two iterations, in fact).

I was immensitely proud of what Decaf Sucks became, and for the collaboration with Max Wheeler in building it.

Unfortunately, as various internet APIs changed, the site atrophied, eventually became disfunctional, and we had to take it down. I still have the database, however, and I want to bring it back!

This time around, my plan is to do it as a fully open source Hanami 2 example application. Max is even on board to bring back all the UI goodness. For now, you can follow along with the early steps on GitHub. Right now the app is little more than the basic Hanami skeleton with added database integration and a CI setup (Hello Buildkite!), but I plan to grow it bit by bit. Perhaps I’ll try to have something small that I can share with each of these monthly OSS updates.

After Hanami 2 ships, hopefully this will serve as a useful resource for people wanting to see how it plays out in a real working app. And beyond that, I look forward to it serving once again as a place for me to commemorate my coffee travels!

,

Tim SerongAn S3 Storage Experiment

My team at SUSE is working on a new S3-compatible storage solution for Kubernetes, based on Ceph’s RADOS Gateway (RGW), except without any of the RADOS bits. The idea is that you can deploy our s3gw container on top of Longhorn (which provides the underlying replicated storage), and all this is running in your Kubernetes cluster, along with your applications which thus have convenient access to a local S3-compatible object store.

We’ve done this by adding a new storage backend to RGW. The approach we’ve taken is to use SQLite for metadata, with object data stored as files in a regular filesystem. This works quite neatly in a Kubernetes cluster with Longhorn, because Longhorn can provide a persistent volume (think: an ext4 filesystem), on which s3gw can store its SQLite database and object data files. If you’d like to kick the tyres, check out Giuseppe’s deployment tutorial for the 0.2.0 release, but bear in mind that as I’m writing this we’re all the way up to 0.4.0 so some details may have changed.

While s3gw on Longhorn on Kubernetes remains our primary focus for this project, the fact that this thing only needs a filesystem for backing storage means it can be run on top of just about anything. Given “just about anything” includes an old school two node Pacemaker cluster with DRBD for replicated storage, why not give that a try? I kinda like the idea of a good solid highly available S3-compatible storage solution that you could shove into the bottom of a rack somewhere without too much difficulty.

It’s probably eight years since I last deployed Pacemaker and DRBD, so to refresh my memory I ran with SUSE’s latest Highly Available NFS Storage with DRBD and Pacemaker document, but skipped all the NFS bits. That gives a filesystem mounted on one node, which will fail over to the other node if something breaks. On top of that, we need to run the s3gw container, the s3gw-ui container, an nginx HTTPS reverse proxy to smoosh those two together, and a virtual/floating IP, so the whole lot is accessible to the outside world.

Here’s the interesting parts of my Pacemaker configuration:

# crm configure show
[...]
primitive drbd_s3 ocf:linbit:drbd \
        params drbd_resource=s3 drbdconf="/etc/drbd.conf" \
        op monitor interval=29s role=Master \
        op monitor interval=31s role=Slave
primitive fs_s3 Filesystem \
        params device="/dev/drbd0" directory="/data" fstype=ext4 \
        meta target-role=Started \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive https nginx \
        op start timeout=40s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor timeout=30s interval=10s \
        op monitor timeout=30s interval=30s \
        op monitor timeout=60s interval=20s
primitive s3-ip IPaddr2 \
        params ip=192.168.100.50 \
        op monitor interval=10 timeout=20
primitive s3gw podman \
        params image="ghcr.io/aquarist-labs/s3gw:latest" run_opts="-p 7480:7480 -v/data:/data" \
        op start interval=0 timeout=90s \
        op stop interval=0 timeout=90s \
        op monitor interval=30s timeout=30s
primitive s3gw-ui podman \
        params image="ghcr.io/aquarist-labs/s3gw-ui:latest" run_opts="-p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test" \
        op start interval=0 timeout=90s \
        op stop interval=0 timeout=90s \
        op monitor interval=30s timeout=30s
group g-s3 fs_s3 s3gw s3gw-ui https s3-ip
ms ms-drbd_s3 drbd_s3 \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
colocation col-s3_on_drbd inf: g-s3 ms-drbd_s3:Promoted
order o-drbd_before_fs Mandatory: ms-drbd_s3:promote g-s3:start
[...]

The g-s3 group ensures that the ext4 filesystem (fs_s3), s3gw container (s3gw), s3gw-ui container (s3gw-ui), nginx instance (https) and virtual IP (s3-ip) all run on the same node, and start one after another. The colocation and ordering constraints ensure that g-s3 runs on whichever node is currently the DRBD (ms-drbd_s3) primary.

The important pieces of glue here are:

  • The fs_s3 resource mounts /dev/drbd0 on /data
  • The s3gw resource passes -p 7480:7480 -v/data:/data to podman, so the container can write to /data on the host, and the S3 service is accessible via HTTP on port 7480.
  • The s3gw-ui resource passes -p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test to podman, so the UI is accessible via HTTP on port 8080, and it expects the S3 service to be externally available via https://s3gw.sleha.test.
  • nginx is configured to reverse proxy https://s3gw.sleha.test to http://localhost:7480, and https://s3gw-ui.sleha.test to http://localhost:8080.
  • I’ve got an entry in /etc/hosts to point s3gw.sleha.test and s3gw-ui.sleha.test at the virtual IP (192.168.100.50).
  • I’m using self-signed certificates (openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.pem) for s3gw and s3gw-ui, so I had to go visit both https://s3gw.sleha.test and https://s3gw-ui.sleha.test in my browser and accept the SSL certificate before the UI would work.
  • The DRBD config, nginx config and SSL certificates and keys need to be present on all nodes. I used csync2 for this.

Here’s my /etc/nginx/nginx.conf. I’m not entirely convinced I’ve got everything 100% right here, but it seems to work (this is, incredibly, my first time doing anything with nginx, and my first time dealing with CORS):

worker_processes  1;

events {
    worker_connections  1024;
    use epoll;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       80;
        return       301 https://$host$request_uri; 
    }

    server {
        listen       443 ssl;
        server_name  s3gw.sleha.test;

        access_log /var/log/nginx/s3gw.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            add_header Access-Control-Allow-Origin 'https://s3gw-ui.sleha.test';
            add_header Access-Control-Allow-Methods 'GET,HEAD,PUT,POST,DELETE';
            add_header Access-Control-Allow-Headers '*';
            add_header 'Access-Control-Allow-Credentials' 'true';

            if ($request_method = 'OPTIONS') {
                add_header Access-Control-Allow-Origin 'https://s3gw-ui.sleha.test';
                add_header Access-Control-Allow-Methods 'GET,HEAD,PUT,POST,DELETE';
                add_header Access-Control-Allow-Headers '*';
                add_header 'Access-Control-Allow-Credentials' 'true';
                add_header 'Content-Type' 'text/plain charset=UTF-8';
                add_header 'Content-Length' 0;
                return 204;
            }

            proxy_pass          http://localhost:7480;
            proxy_read_timeout  90;
            proxy_redirect      http://localhost:7480 https://s3gw.sleha.test;
        }

        ssl_certificate      cert.pem;
        ssl_certificate_key  cert.key;
        ssl_protocols        TLSv1.2;
        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  5m;
        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;
    }

    server {
        listen       443 ssl;
        server_name  s3gw-ui.sleha.test;

        access_log /var/log/nginx/s3gw-ui.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            proxy_pass          http://localhost:8080;
            proxy_read_timeout  90;

            proxy_redirect      http://localhost:8080 https://s3gw-ui.sleha.test;
        }

        ssl_certificate      cert-ui.pem;
        ssl_certificate_key  cert-ui.key;
        ssl_protocols        TLSv1.2;
        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  5m;
        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;
    }
}

A couple of important points about Pacemaker’s support for running containers with podman:

So what was the end result? TL;DR: It pretty much All Just WorkedTM, which is exactly what you’d hope for when running a new application on a mature HA stack. I can use s3cmd to mess around with the S3 service, and use my web browser to play with the UI. Failover is nice and quick (think: a few seconds) if I kill a node. For the sake of convenience I did this experiment on a couple of VMs using the external/libvirt STONITH plugin, but I don’t expect a real deployment to be hugely different in behaviour. Also, I’d forgotten how good Pacemaker is at highlighting poorly behaved applications – prior to this experiment the s3gw-ui container didn’t stop well, but we weren’t aware of that until I tried a manual failover which took too long and resulted in an unexpected STONITH due to a stop timeout. Moritz has since fixed that.

One thing I tripped over when doing this deployment was the correct values to use for the access_key and secret_key of the default user when talking to the S3 service. These are actually settable for the s3gw container via the RGW_DEFAULT_USER_ACCESS_KEY and RGW_DEFAULT_USER_SECRET_KEY environment variables, but if left unset, they default to “test” and “test” respectively. The interesting bits of my s3cmd.cfg are thus:

access_key = test
secret_key = test
host_base = https://s3gw.sleha.test/
host_bucket = htts://s3gw.sleha.test/%(bucket)

In retrospect I probably should have added -e RGW_DEFAULT_USER_ACCESS_KEY=tserong -e RGW_DEFAULT_USER_SECRET_KEY=do_not_tell_anyone_this_is_your_password to the run_opts parameter of the s3gw resource in the Pacemaker config.

,

Simon LyallAudiobooks – August 2022

The Second World Wars: How the First Global Conflict Was Fought and Won by Victor Davis Hanson

Compares the Allied and Axis powers in just about every aspect one by one and in the majority find the Allies ahead. Strongly recommend to those interested in WW2. 5/5

The Man with the Golden Gun by Ian Fleming

The final Bond novel by Fleming. Bond investigates gangsters and spies in Jamaica. Readable but not the best in the series. 3/5

The Hammer of God by Arthur C. Clarke

A Hard Core SciFi story set in the year 2109 involving an asteroid threatening to hit earth and the life of captain of the ship sent to stop it. Fans of Clarke and similar authors will enjoy 3/5

More Money Than God: Hedge Funds and the Making of the New Elite by Sebastian Mallaby

A history of Hedge Funds in the US up to just after the 2008 crash. Profiles of people and companies at each stage. Interesting and easy to follow. 3/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Francois MarierUsing a Streamzap remote control with VLC on a Raspberry Pi

First of all, I am starting from a working Streamzap remote in Kodi. If VLC is the first application you are setting up with the Streamzap remote then you will probably need to read the above blog post first.

Once you know you have a working remote, put the following lircrc config into /home/pi/.lircrc:

begin
  prog = vlc
  button = KEY_PLAY
  config = key-play
end

begin
  prog = vlc
  button = KEY_PAUSE
  config = key-pause
end

begin
  prog = vlc
  button = KEY_STOP
  config = key-stop
end

begin
  prog = vlc
  button = KEY_POWER
  config = key-quit
end

begin
  prog = vlc
  button = KEY_NEXT
  config = key-next
end

begin
  prog = vlc
  button = KEY_PREVIOUS
  config = key-prev
end

begin
  prog = vlc
  button = KEY_RED
  config = key-toggle-fullscreen
end

begin
  prog = vlc
  button = KEY_REWIND
  config = key-slower
end

begin
  prog = vlc
  button = KEY_FORWARD
  config = key-faster
end

begin
  prog = vlc
  button = KEY_VOLUMEDOWN
  config = key-vol-down
end

begin
  prog = vlc
  button = KEY_VOLUMEUP
  config = key-vol-up
end

begin
  prog = vlc
  button = KEY_BLUE
  config = key-audio-track
end

begin
  prog = vlc
  button = KEY_MUTE
  config = key-vol-mute
end

begin
  prog = vlc
  button = KEY_LEFT
  config = key-nav-left 
end

begin
  prog = vlc
  button = KEY_DOWN
  config = key-nav-down
end

begin
  prog = vlc
  button = KEY_UP
  config = key-nav-up
end

begin
  prog = vlc
  button = KEY_RIGHT
  config = key-nav-right
end

begin
  prog = vlc
  button = KEY_MENU
  config = key-nav-activate
end

begin
  prog = vlc
  button = KEY_GREEN
  config = key-subtitle-track
end

and then after starting VLC:

  1. Open Tools | Preferences.
  2. Select All under Show Settings in the bottom left corner.
  3. Open Interface | Control Interfaces in the left side-bar.
  4. Enable Infrared remote control interface.

Now you should see lirc in the text box at the bottom of Control Interfaces and the following in your ~/.config/vlc/vlcrc:

[core]
control=lirc

If you're looking to customize the above key mapping, you can find the VLC key codes in the output of vlc -H --extended | grep -- --key-.

,

Francois MarierRemote logging of Turris Omnia log messages using syslog-ng and rsyslog

As part of debugging an upstream connection problem I've been seeing recently, I wanted to be able to monitor the logs from my Turris Omnia router. Here's how I configured it to send its logs to a server I already had on the local network.

Server setup

The first thing I did was to open up my server's rsyslog (Debian's default syslog server) to remote connections since it's going to be the destination host for the router's log messages.

I added the following to /etc/rsyslog.d/router.conf:

module(load="imtcp")
input(type="imtcp" port="514")

if $fromhost-ip == '192.168.1.1' then {
    if $syslogseverity <= 5 then {
        action(type="omfile" file="/var/log/router.log")
    }
    stop
}

This is using the latest rsyslog configuration method: a handy scripting language called RainerScript. Severity level 5 maps to "notice" which consists of unusual non-error conditions, and 192.168.1.1 is of course the IP address of the router on the LAN side. With this, I'm directing all router log messages to a separate file, filtering out anything less important than severity 5.

In order for rsyslog to pick up this new configuration file, I restarted it:

systemctl restart rsyslog.service

and checked that it was running correctly (e.g. no syntax errors in the new config file) using:

systemctl status rsyslog.service

Since I added a new log file, I also setup log rotation for it by putting the following in /etc/logrotate.d/router:

/var/log/router.log
{
    rotate 4
    weekly
    missingok
    notifempty
    compress
    delaycompress
    sharedscripts
    postrotate
        /usr/lib/rsyslog/rsyslog-rotate
    endscript
}

In addition, since I use logcheck to monitor my server logs and email me errors, I had to add /var/log/router.log to /etc/logcheck/logcheck.logfiles.

Finally I opened the rsyslog port to the router in my server's firewall by adding the following to /etc/network/iptables.up.rules:

# Allow logs from the router
-A INPUT -s 192.168.1.1 -p tcp --dport 514 -j ACCEPT

and ran iptables-apply.

With all of this in place, it was time to get the router to send messages.

Router setup

As suggested on the Turris forum, I ssh'ed into my router and added this in /etc/syslog-ng.d/remote.conf:

destination d_loghost {
        network("192.168.1.200" time-zone("America/Vancouver"));
};

source dns {
        file("/var/log/resolver");
};

log {
        source(src);
        source(net);
        source(kernel);
        source(dns);
        destination(d_loghost);
};

Setting the timezone to the same as my server was needed because the router messages were otherwise sent with UTC timestamps.

To ensure that the destination host always gets the same IP address (192.168.1.200), I went to the advanced DHCP configuration page and added a static lease for the server's MAC address so that it always gets assigned 192.168.1.200. If that wasn't already the server's IP address, you'll have to restart it for this to take effect.

Finally, I restarted the syslog-ng daemon on the router to pick up the new config file:

/etc/init.d/syslog-ng restart

Testing

In order to test this configuration, I opened three terminal windows:

  1. tail -f /var/log/syslog on the server
  2. tail -f /var/log/router.log on the server
  3. tail -f /var/log/messages on the router

I immediately started to see messages from the router in the third window and some of these, not all because of my severity-5 filter, were flowing to the second window as well. Also important is that none of the messages make it to the first window, otherwise log messages from the router would be mixed in with the server's own logs. That's the purpose of the stop command in /etc/rsyslog.d/router.conf.

To force a log messages to be emitted by the router, simply ssh into it and issue the following command:

logger Test

It should show up in the second and third windows immediately if you've got everything setup correctly

Timezone problems

If I do the following on my router:

/etc/init.d/syslog-ng restart logger TestA

I see the following in /var/log/messages:

Aug 14 20:39:35 hostname syslog-ng[9860]: syslog-ng shutting down; version='3.37.1' Aug 14 20:39:36 hostname syslog-ng[10024]: syslog-ng starting up; version='3.37.1' Aug 15 03:39:49 hostname root: TestA

The correct timezone is the one in the first two lines. Other daemon messages are displayed using an incorrect timezone like logger.

Thanks to a very helpful syslog-ng mailing list thread, I found that this is actually an upstream OpenWRT bug.

My favourite work-around is to tell syslog-ng to simply ignore the timestamp provided by the application and to use the time of reception (of the log message) instead. To do this, simply change the following in /etc/syslog-ng.conf:

source src {
    internal();
    unix-dgram("/dev/log");
};

to:

source src {
    internal();
    unix-dgram("/dev/log", keep-timestamp(no));
};

Unfortunately, I wasn't able to fix it in a way that would survive a syslog-ng package update, but since this is supposedly fixed in Turris 6.0, it shouldn't be a problem for much longer.

,

Tim RileyOpen source status update, May–July 2022

Hi there friends, it’s certainly been a while, and a lot has happened across May, June and July: I left my job, took some time off, and started a new job. I also managed to get a good deal of open source work done, so let’s take a look at that!

Released Hanami 2.0.0.alpha8

Since we’d skipped a month in our releases, I helped get Hanami 2.0.0.alpha8 out the door in May. The biggest change here was that we’d finished relocating the action and view integration code into the hanami gem itself, wrapped up in distinct “application� classes, like Hanami::Application::Action. In the end, this particular naming scheme turned out to be somewhat short lived! Read on for more :)

Resurrected work using dry-effects within hanami-view

As part of an effort to make it easy to use our conventional view “helpers� in all parts of our view layer, I resurrected my work from September 2020(!) on using dry-effects within hanami-view. The idea here was to achieve two things:

  1. To ensure we keep only a single context object for the entire view rendering, allowing its state to be preserved and accessed by all view components (i.e. allowing both templates, partials and parts all to access the very same context object)
  2. To enable access to the current template/partial’s #locals from within the context, which might help make our helpers feel a little more streamlined through implicit access to those locals

I got both of those working (here’s my work in progress), but I discovered the performance had worsened due to the cost of using an effect to access the locals. I took a few extra passes at this, reducing the number of effects to one, and memoziing it, leaving us with improved performance over the main branch, but with a slightly different stance: the single effect is for accessing the context object only, so any helpers, instead of expecting access to locals, will instead only have access to that context. The job from here will be to make sure that the context object we build for Hanami’s views has everything we need for an ergonomic experience working with our helpers. I’m feeling positive about the direction here, but it’ll be a little while before I get back to it. Read on for more on this (again!).

Unified application and slice

The biggest thing I did over this period was to unify Hanami’s Application and Slice. This one took some doing, and I was glad that I had a solid stretch of time to work on it between jobs.

I already wrote about this back in April’s update, noting that I’d settled on the approach of having a composed slice inside the Hanami::Application class to providing slice-like functionality at the application level. This was the approach I continued with, and as I went, I was able to move more and more functionality out of Hanami::Application and into Hanami::Slice, with that composed “application slice� being the thing that preserved the existing application behaviour. At some point, a pattern emerged: the application is a slice, and we could achieve everything we wanted (and more) by turning class Hanami::Application into class Hanami::Application < Hanami::Slice.

Turning the application into a slice sublcass is indeed how I finished the work, and I’m extremely pleased with how it turned out. It’s made slices so much more powerful. Now, each slice can have its own config, its own dedicated settings and routes, can be run on its own as a Rack application, and can even have its own set of child slices.

As a user of Hanami you won’t be required to use all of this per-slice power features, but they’ll be there if or when you want them. This is a great example of progressive disclosure, a principle I follow as much as possible when designing Hanami’s features: a user should be able to work with Hanami in a simple, straightforward way, and then as their needs grow, they can then find additional capabilities waiting to serve them.

Let’s explore this with a concrete example. If you’re building a simple Hanami app, you can start with a single top-level config/settings.rb that defines all of the app’s own settings. This settings object is made available as a "settings" component registration in both the app as well as all its slices. As the app grows and you add a slice or two, you start to add more slice-specific settings to this component. At this point you start to feel a little uncomfortable that settings specific to SliceA are also available inside SliceB and elsewhere. So you wonder, could you go into slices/slice_a/ and drop a dedicated config/settings.rb there? The answer to that is now yes! Create a config/settings.rb inside any slice directory and it will now become a dedicated settings component for that slice alone. This isn’t a detail you had to burden yourself with in order to get started, but it was ready for you when you needed it.

Another big benefit of this code reorganisation is that the particular responsibilities of Hanami::Application are much clearer: its job is to provide the single entrypoint to the app and coordinate the overall boot process; everything else comes as part of it also being a slice. This distinction is made clear through the number of public methods that exist across the two classes: Application now has only 2 distinct public methods, whereas Slice currently brings 27.

There’s plenty more detail over in the pull request: go check it out!

The work here also led to changes across the ecosystem:

This is one the reasons I’m excited about Hanami’s use of the dry-rb gems: it’s pushing them in directions no one has had to take them before. The result is not only the streamlined experience we want for Hanami, but also vastly more powerful underpinnings.

Devised a slimmed down core app structure

While I had my head down working on internal changes like the above, Luca had been thinking about Hanami 2 adoption and the first run user experience. As we had opted for a slices-only approach for the duration of our alpha releases, it meant a fairly bulky overall app structure: every slice came with multiple deeply nested files. This might be overwhelming to new users, as well as feeling like overkill for apps that are intended to start small and stay small.

To this end, we agreed upon a stripped back starter structure. Here’s how it looks at its core (ignoring tests and other general Ruby files):

├── app/
│   ├── action.rb
│   └── actions/
├── config/
│   ├── app.rb
│   ├── routes.rb
│   └── settings.rb
├── config.ru
└── lib/
    ├── my_app/
    │   └── types.rb
    └── tasks/

That’s it! Much more lightweight. This approach takes advantage of the Hanami app itself becoming a fully-featured slice, with app/ now as its source directory.

In fact, I took this opportunity to unify the code loading rules for both the app and slices, which makes for a much more intuitive experience. You can now drop any ruby source file into app/ or a slices/[slice_name]/ slice dir and it will be loaded in the same way: starting at the root of each directory, classes defined therein are expected to inhabit the namespace that the app or slice represents, so app/some_class.rb would be MyApp::SomeClass and slices/my_slice/some_class would be MySlice::SomeClass. Hat tip to me of September 2021 for implementing the dry-system namespaces feature that enabled this! 😜

(Yet another little dry-system tweak came out of preparing this too, with Component#file_name now exposed for auto-registration rules).

This new initial structure for starter Hanami 2.0 apps is another example of progressive disclosure in our design. You can start with a simple all-in-one approach, everything inside an app/ directory, and then as various distinct concerns present themselves, you can extract them into dedicated slices as required.

Along with this, some of our names have become shorter! Yes, “application� has become “app� (and Hanami::Application has become Hanami::App, and so on). These shorter names are easier to type, as well as more reflective of the words we tend to use when verbally describing these structures.

We also tweaked our actions and views integration code so that it is automatically available when you inherit directly from Hanami::Action, so it will no longer be necessary to have the verbose Hanami::Application::Action as the superclass for the app’s actions. We also ditched that namespace for both routes and settings too, so now you can just inherit from Hanami::Settings and the like.

Devised a slimmed down release strategy

Any of you following my updates would know by now that the Hanami 2.0 release has been a long time coming. We have ambitious goals, we’re doing our best, and everything is slowly coming together. But as hard as it might’ve been for folks who’re waiting, it’s been doubly so for us, feeling the weight of both the work along with everyone’s expectations.

So to make sure we can focus our efforts and get something out the door sooner rather than later, we decided to stagger our 2.0 release. We’ll start off with an initial 2.0 release centred around hanami, hanami-cli, hanami-controller, and hanami-router (enough to write some very useful API applications, for example), then follow up with a “full stack� 2.1 release including database persistence, views, helpers, assets and everything else.

I’m already feeling empowered by this strategy: 2.0 feels actually achievable now! And all of the other release-related work like updated docs and a migration guide will become correspondingly easier too.

Released Hanami 2.0.0.beta1!

With greater release clarity as well as all the above improvements under our belt, it was time to usher in a new phase of Hanami 2.0 development, so we released 2.0.0.beta1 in July! This new version suffix represents just how close we feel we are to our final vision for 2.0. This is an exciting moment!

And a bunch more

This update is getting rather long, so let me list a bunch of other Hanami improvements I managed to get done:

Outside my Hanami development, a new job and a new computer meant I also took the change to reboot my dotfiles, which are now powered by chezmoi. I can’t speak highly enough of chezmoi, it’s an extremely powerful tool and I’m loving the flexibility it affords!

That’s it from me for now. I’ll come back to you all in another month!

,

Lev LafayettePsychic Vampires from Without and Within

The past few days I've been quite influenced by a short essay by Brianna Wiest, which starts with quite a kicker: "It is the hardest thing you will ever have to do, and it will also be the most important: stop giving your love to those who aren’t ready to love you. Stop having hard conversations with people who don't want to change." Ultimately, this is arguing for equality in all relationships. Equality is not a matter of capability or social position. Everyone is unequal in some regard; some people are stronger, some people are smarter, some people know more, some people are richer or have higher status, etc. Individual differences and capabilities are inevitable. Social differences, especially systemic differences (wealth, power) are a little out of our control, barring social change through effective collective effort.

But between people, the one thing that we can be equal in our relationships is concern, care, and effort relative to personal capacity. In other words, if you make a consistent effort to help a person when they are in need, and they are conspicuously absent when you need help, then it is probably best that you redirect your attention elsewhere. As charming as they may be (and they often are when they are at their best) one needs to develop protection against psychic vampires because ultimately they will hurt you.

Psychic vampires exist along a continuum of conscious to unconscious action. On the first extreme, we're dealing with people who are almost certainly are narcissistic [psycho, socio]paths, conscious manipulators who are empowered by the emotional energy they drain from their dependent flock. Obviously, such people are to be avoided at all costs and best left to the professionals to deal with. On the other extreme are those who due to their own impulsivity or lack of security, is a lot needier and have more affective instability. They may even be aware of their vampiric behaviour and feel terribly guilty about it. Unlike the pathological, if one is willing, able, and patient, then these vampiric souls can become a better version of themselves. It will take time and effort because they are, in effect, learning to establish new and unfamiliar pathways in their own brain. Some vampires you want to dispel; others you wish to turn.

Letting go does not just involve vampiric people but also vampiric activities and attachments in one's own life. Most psychic vampires are not like that all the time (indeed, some can be utterly amazing people when they're not being draining), and all of us can be a little bit vampiric at times. Identifying our own vampiric behaviours towards others and toward ourselves, is essential. Are we wasting other people's time and draining their energy? Are we wasting our own time and our energy? When one makes an assessment of the time and activities that we engage in that do not help ourselves and others, when supposedly so-called fun and harmless distractions take up excess time and emotional attachment, then we're in the realm of vampirism. We're literally sucking the life from others and from ourselves, retarding our own personal development and empathy. It is little wonder to discover that clarity in one's own self-concept and associated activities also enhance empathy towards others. This brings us back to the original essay; how do you think empathic people treat others?

,

Lev LafayetteThe Future of the University in the Age of the Internet : An Australian Perspective

This year I completed my MHEd thesis (early) to complete the degree in this field. The following is the abstract and link to the study.

Abstract

One component of the broad sweep of educational history is the qualitative changes in information and communication technologies, in which each new development both building on top of its previous and extending its scope. The development of networked information and communication technologies in contemporary times, "the Internet", potentially provides a new mode of communication whose limiting factors include capacities of the physical systems and the allocation of economic resources. This macrological inquiry suggests that these foundational matters have been largely overlooked when considering educational technology and as a result four research themes are raised; (i) the identification of the demographic importance and trends of higher education., (ii) the economics of higher education, particularly the notion of cost-disease in service sectors and positive externalities, (iii) the engineering restrictions and licensing restrictions to software applications, and (iv) user-experiences of existing educational software. A prescriptive conclusion raises policy matters concerning the need for more extensive public funding, the adoption of open-source licensing for applications and educational content, and the use of client-server models rather than the trend toward cloud-based software architecture.

http://levlafayette.com/files/mhed590-thesis.pdf

,

Ian BrownHigh Velocity Migrations with GCVE and HCX

What is HCX? VMware HCX is an application mobility platform designed for simplifying application migration, workload rebalancing and business continuity across datacenters and clouds. VMware HCX was formerly known as Hybrid Cloud Extension and NSX Hybrid Connect. GCVE HCX GCVE deploys the Enterprise version of HCX as part of the cost of the solution. HCX Enterprise has the following benefits: Hybrid Interconnect WAN Optimisation Bulk Migration, Live Migration and HCX Replication Assisted vMotion Cloud to cloud migration Disaster Protection KVM & Hyper-V to vSphere migrations Traffic Engineering Mobility Groups Mobility Optimised Networking Changeover scheduling Definitions Cold Migration

,

Ian BrownInfrastructure as Code with Terraform in GCVE

We have seen a lot of Google Cloud VMware Engine over the last few months and for the entire time we have used click-ops to provision new infrastructure, networks and VM’s. Now we are going to the next level and we will be using Terraform to manage our infrastructure as code so that it is version controlled and predictable. Installing Terraform The first part of getting this working is installing Terraform on your local machine.

,

Tim SerongHack Week 21: Keeping the Battery Full

As described in some detail in my last post, we have a single 10kWh Redflow ZCell zinc bromine flow battery hooked up to our solar PV via Victron inverter/chargers. This gives us the ability to:

  • Store almost all the excess energy we generate locally for later use.
  • When the sun isn’t shining, grid charge the battery at off-peak times then draw it down at peak times to save on our electricity bill (peak grid power is slightly more than twice as expensive as off-peak grid power).
  • Opportunistically survive grid outages, provided they don’t happen at the wrong time (i.e. when the sun is down and the battery is at 0% state of charge).

By their nature, ZCell flow batteries needs to undergo a maintenance cycle at least every three days, where they are discharged completely for a few hours. That’s why the last point above reads “opportunistically survive grid outages”. With a single ZCell, we can’t use the “minimum state of charge” feature of the Victron kit to always keep some charge in the battery in case of outages, because doing so conflicts with the ZCell maintenance cycles. Once we eventually get a second battery, this problem will go away because the maintenance cycles automatically interleave. In the meantime though, as my project for Hack Week 21, I decided to see if I could somehow automate the Victron scheduled charge configuration based on the ZCell maintenance cycle timing, to always keep the battery as full as possible for as long as possible.

There are three goals somewhat in tension with each other here:

  • Keep the battery full, except during maintenance cycles.
  • Don’t let the battery get too full immediately before a maintenance cycle, lest the discharge take too long and maintenance still be active the following morning.
  • Don’t schedule charges during peak electricity times (we still want to draw the battery down then, to avoid using the expensive gold plated electrons the power company sends down the wire between 07:00-10:00 and 16:00-21:00).

Here’s the solution I came up with:

  • On non-maintenance cycle days, set two no-limit scheduled charges, one from 10:00 for 6 hours, the other from 21:00 for 10 hours. That means the battery will be charged from the grid and/or the sun continuously, except for peak electricity times, when it will be drawn down. Our loads aren’t high enough to completely deplete the battery during peak times, so there will always be some juice in case of a grid outage on non-maintenance cycle days.
  • On maintenance cycle days, set a 50% limit scheduled charge from 13:00 for 3 hours, so the battery won’t be too full before that evening’s maintenance cycle, which kicks in at sunset. The day after a maintenance cycle, set a no limit scheduled charge from 03:00 for 4 hours. At our site, maintenance has almost always finished before 03:00, so there’s no conflict here, and we still have time to get some charge into the battery to handle the next morning’s peak.

Now, how to automate that?

The ZCell Battery Management System (BMS) has a REST API which we can query to find out useful information about the battery. Unfortunately it won’t actually tell us for certain whether maintenance will be run on any given day, but we can get the maintenance time limit, and subtract from that the amount of time that’s passed since the last maintenance cycle. If the resultant figure is less than one day, we know that maintenance will happen today. It is possible for maintenance to happen at other times, e.g. I can force maintenance manually, and also it can happen more often than every three days if you mess with the allowed days setting in the BMS, so this solution arguably isn’t perfect, but I think it’s good enough under the circumstances, at least at our site.

The Victron Cerbo GX (the little box that controls everything) runs Linux, and you can easily get root on it, so it’s possible to write scripts that run locally there. Here’s what I ended up with:

One important point about installing things on the Cerbo GX, is that the root partition is overwritten during firmware updates, but there’s a separate data partition which is preserved. The root user’s home directory is symlinked to /data/home/root, so my script lives at /data/home/root/sched.py to ensure it remains present. Then we need to get it into /etc/crontab, which doesn’t survive firmware updates. This is done by adding a /data/rc.local script which the Cerbo GX runs on boot:

After a few days of testing and observation, I can confirm that it all works perfectly! At least, at our site, right now, with our current loads and daylight ours. The whole thing will want revisiting (or probably just turning off) as we get into summer, when we’ll be able to rely on significantly more sunlight to keep the battery full than we get now. I may well just go back to a single 03:00-for-four-hours grid charge then, once the days are nice and long. See how we go…

,

Tim RileyJoining Buildkite, and sticking with Ruby

Last week I finished up at Culture Amp, and I’m excited to announce that I’ll be joining Buildkite as an engineer!

My time at Culture Amp was special. It was my first role after a decade of running Icelab with Max and Michael. Culture Amp hired everyone at Icelab after we decided to close the business, providing both a smooth transition and new opportunities to a singular group. I built a great working relationship with my manager, I was trusted to do big things, and I relished the chance to work with and learn from a large group of engineers. I’m deeply thankful for all of this.

Towards the end, I was serving as Culture Amp’s Director of Back End Engineering, and moving into engineering management. However, as any astute reader of this blog might attest, I am deeply motivated by hands on programming work, and all the learning and collaboration opportunities that go with it. I realised it was not the time to draw that chapter to a close (it might never!), and through that consideration I connected with Buildkite.

I’m excited to join Buildkite for many reasons! It’s a great Australian company with heart and personality. It brims with people I’ve long dreamt of working with. Developer tooling is an area close to my heart. And they’re growing a (majestic) Ruby app at the core of their tech. I can’t wait to dig in.

For me, this is also an intentional decision to stick with Ruby. The work I’m doing in Ruby OSS right now might be one of the biggest “dents in the universe” I get to make. I want to see this effort through, to complete our vision for Hanami 2.0, then learn from how it’s adopted by our community.

I have some time off between jobs, which I’ll use to give our Hanami work a real boost: I’ll be commiting nearly 6 weeks of full-time work to Hanami! Based on previous experience, this should see me get through what otherwise might have taken 6 months of part-time effort. I’m hoping this will get us significantly closer to 2.0. I’ll likely start another tweet thread of my efforts, so find me on Twitter if you’d like to follow along!

,

Ian BrownGCVE Backup and Disaster Recovery

Picking up where we left off last month, let’s dive into disaster recovery and how to use Site Recovery Manager and Google Backup & Protect to DR into and within the cloud with GCVE. But before we do, a quick advertisement: If you are in Brisbane, Australia, I suggest coming to the awesome Google Infrastructure Group (GIG) which focuses on GCVE where on 04 July 2022 I will be presenting on Terraform in GCVE.

,

Francois MarierUsing Gandi DNS for Let's Encrypt certbot verification

I had some problems getting the Gandi certbot plugin to work in Debian bullseye since the documentation appears to be outdated.

When running certbot renew --dry-run, I saw the following error message:

Plugin legacy name certbot-plugin-gandi:dns may be removed in a future version. Please use dns instead.

Thanks to an issue in another DNS plugin, I was able to easily update my configuration to the new naming convention.

Setup

Get an API key from Gandi and then put it in /etc/letsencrypt/gandi.ini:

# live dns v5 api key
dns_api_key=ABCDEF

before make it only readable by root:

chown root:root /etc/letsencrypt/gandi.ini
chmod 600 /etc/letsencrypt/gandi.ini

Then install the required package:

apt install python3-certbot-dns-gandi

Getting an initial certificate

To get an initial certificate using the Gandi plugin, simply use the following command:

certbot certonly -a dns --dns-credentials /etc/letsencrypt/gandi.ini -d example.fmarier.org

Setting up automatic renewal

If you have automatic renewals enabled, you'll want to ensure your /etc/letsencrypt/renewal/example.fmarier.org.conf file looks like this:

# renew_before_expiry = 30 days
version = 1.12.0
archive_dir = /etc/letsencrypt/archive/example.fmarier.org
cert = /etc/letsencrypt/live/example.fmarier.org/cert.pem
privkey = /etc/letsencrypt/live/example.fmarier.org/privkey.pem
chain = /etc/letsencrypt/live/example.fmarier.org/chain.pem
fullchain = /etc/letsencrypt/live/example.fmarier.org/fullchain.pem

[renewalparams]
account = abcdef
authenticator = dns
server = https://acme-v02.api.letsencrypt.org/directory
dns_credentials = /etc/letsencrypt/gandi.ini

,

Tim RileyOpen source status update, April 2022

April was a pretty decent month for my OSS work! Got some things wrapped up, kept a few things moving, and opened up a promising thing for investigation. What are these things, you say? Let’s take a look!

Finished centralisation of Hanami action and view integrations

I wrote about the need to centralise these integrations last month, and in April, I finally got the work done!

This was a relief to get out. As a task, while necessary, it felt like drudge work – I’d been on it since early March, after all! I was also conscious that this was also blocking Luca’s work on helpers all the while.

My prolonged work on this (in part, among other things like Easter holidays and other such Real Life matters) contributed to us missing April’s Hanami release. The good thing is that it’s done now, and I’m hopeful we can have this released via another Hanami alpha sometime very soon.

In terms of the change to Hanami apps, the biggest change from this is that your apps should use a new superclass for actions and views:

require "hanami/application/action"

module Main
  module Action
    # Used to inherit from Hanami::Action
    class Base < Hanami::Application::Action
    end
  end
end

Aside from the benefit to us as maintainers of having this integration code kept together, this distinct superclass should also help make it clearer where to look when learning about how actions and views work within full Hanami apps.

Enabled proper access to full locals in view templates

I wound up doing a little more work in actions and views this month. The first was a quickie to unblock some more of Luca’s helpers work: making access to the locals hash within templates work like we always expected it would.

This turned out to be a fun one. For a bit of background, the context for every template rendering in hanami-view (i.e. what self is for any given template) is an Hanami::View::Scope instance. This instance contains the template’s locals, makes the full locals hash available as #locals (and #_locals, for various reasons), and uses #method_missing to make also make each local directly available via its own name.

Luca found, however, that calling locals within the template didn’t work at all! After I took a look, it seemed that while locals didn’t work, self.locals or just plain _locals would work. Strange!

Turns out, this all came down to implementation details in Tilt, which we use as our low-level template renderer. The way Tilt works is that it will compile a template down into a single Ruby method that receives a locals param:

def compile_template_method(local_keys, scope_class=nil)
  source, offset = precompiled(local_keys)
  local_code = local_extraction(local_keys)

  # <...snip...>

  method_source << <<-RUBY
    TOPOBJECT.class_eval do
      def #{method_name}(locals)
        #{local_code}
  RUBY

Because of this, locals is actually a local variable in the context of that method execution, which will override any other methods also available on the scope object that Tilt turns into self for the rendering.

Here is how we were originally rendering with Tilt:

tilt(path).render(scope, &block)

My first instinct was simply to pass our locals hash as the (optional) second argument to Tilt’s #render:

tilt(path).render(scope, scope._locals)

But even that didn’t work! Because in generating that local_code above, Tilt will actually take the locals and explode it out into individual variable assignments:

def local_extraction(local_keys)
  local_keys.map do |k|
    if k.to_s =~ /\A[a-z_][a-zA-Z_0-9]*\z/
      "#{k} = locals[#{k.inspect}]"
    else
      raise "invalid locals key: #{k.inspect} (keys must be variable names)"
    end
  end.join("\n")
end

But we don’t need this at all, since hanami-view’s scope object is already making those locals available individually, and we want to ensure access to those locals continues to run through the scope object.

So the ultimate fix is to make locals of our locals. Yo dawg:

tilt(path).render(scope, {locals: scope._locals}, &block)

This gives us our desired access to the locals hash in templates (because that locals key is itself turned into a solitary local variable), while preserving the rest of our existing scope-based functionality.

It also shows me that I probably should’ve written an integration test back when I introduced access to a scope’s locals back in January 2019. 😬

Either way, I’m excited this came up and I could fix it, because it’s an encouraging sign of just how much of this view system we’ll be able to put to use in creating a streamlined and powerful view layer for our future Hanami users!

Merged a fix to stop unwanted view rendering of halted requests

Thanks to our extensive use of Hanami at Culture Amp, my friend and colleague Andrew discovered and fixed a bug with our automatic rendering of views within actions, which I was happy to merge in.

Shipped some long awaited dry-configurable features

After keeping poor ojab waiting way too long, I also merged a couple of nice enhancements he made to dry-configurable:

I then released these as dry-configurable 0.15.0.

Started work on unifying Hanami slices and application

Last but definitely not least, I started work on one of the last big efforts we need in place before 2.0: making Hanami slices act as much as possible like complete, miniature Hanami applications. I’m going to talk about this a lot more in future posts, but for now, I can point you to a few PRs:

  • Introducing Hanami::SliceName (a preliminary, minor refactoring to fix some slice and application name determination responsibilities that had somehow found their way into our configuration class).
  • A first, abandoned attempt at combining slices and applications, using a mixin for shared behaviour.
  • A much more promising attempt using a composed slice object within the application class, which is currently the base of my further work in this area.

Apart from opening up some really interesting possibilities around making slices fully a portable, mountable abstraction (imagine bringing in slices from gems!), even for our shorter-term needs, this work looks valuable, since I think it should provide a pathway for having application-wide settings kept on the application class, while still allowing per-slice customisation of those settings in whichever slices require them.

The overall slice structure is also something that’s barely changed since I put it in place way back in late 2019. Now it’s going to get the spit and polish it deserves. Hopefully I’ll be able to share more progress on this next month :) See you then!

,

Ian BrownGCVE Advanced Auto-Scaling

Let’s pick up where we left off from last months article and start setting up some of the features of GCVE, starting with Advanced Autoscaling. What is Advanced Auto-Scaling? Advanced Autoscaling automatically expands or shrinks a private cloud based on CPU, memory and storage utilisation metrics. GCVE monitors the cluster based on the metrics defined in the autoscale policy and decides to add or remove nodes automatically. Remember: GCVE is physical Dell Poweredge servers, not a container/VM running in Docker or on a hypervisor like VMware.

,

Tim RileyTwo years of open source status updates

Back in March of 2020, I decided to take up the habit of writing monthly status updates for my open source software development. 22 updates and 25k words later, I’m happy to be celebrating two years of status updates!

As each month ticks around, I can find it hard to break away from cutting code and write my updates, but every time I get to publishing, I’m really happy to have captured my progress and thinking.

After all, these posts now help remind me I managed to do all of the following over the last two years (and these were just the highlights!):

  • Renamed dry-view to hanami-view and kicked off view/application integration (Mar 2020)
  • Received my first GitHub sponsor (Apr 2020), thank you Benny Klotz (who still sponsors me today!)
  • Shared my Hanami 2 application template (May 2020)
  • Achieved seamless view/action/application integration (May 2020)
  • Brought class-level configuration to Hanami::Action (Jun 2020)
  • Introduced application-level configuration for actions and views (Jul 2020)
  • Added automatic inference for an action’s paired view, along with automatic rendering (Jul 2020)
  • Introduced application integration for view context classes (Jul 2020)
  • Supported multiple boot file dirs in dry-system, allowing user-replacement of standard bootable components in Hanami (Aug 2020)
  • Rebuilt the Hanami Flash class (Aug 2020)
  • Resumed restoring hanami-controller features through automatic enabling of CSRF protection (Sep 2020)
  • Added automatic configuration to views (inflector, template, part namespace) (Oct 2020)
  • Released a non-web Hanami application template (Oct 2020)
  • Started the long road to Hanami/Zeitwerk integration with an autoloading loader for dry-system (Nov 2020)
  • Introduced dedicated “component dirâ€� abstraction to dry-system, along with major cleanups and consistency wins (Dec 2020/Jan 2021)
  • Added support for dry-system component dirs with mixed namespaces (Feb/Mar/Apr 2021)
  • Released dry-system with all these changes, along with Hanami with working Zeitwerk integration (Mar/Apr 2021)
  • Ported Hanami’s app configuration to dry-configurable (May 2021),
  • Laid the way for dry-configurable 1.0 with some API changes (May/Jul 2021)
  • Returned to dry-system and added configurable constant namespaces (Jun/Jul/Aug/Sep/Oct 2021)
  • Introduced compact slice source dirs to Hanami, using dry-systems constant namespaces (Sep/Oct 2021)
  • Added fully configurable source dirs to Hanami (Nov/Dec 2021)
  • Shipped a huge amount of dry-system improvements over two weeks of dedicated OSS time in Jan 2022, including the overhaul of bootable components as part of their rename to providers, as well as partial container imports and exports, plus much more
  • Introduced concrete slice classes and other slice registration improvements to Hanami (Feb 2022)
  • Refactored and relocated action and view integration into the hanami gem itself, and introduced Hanami::SliceConfigurable to make it possible for similar components to integrate (Mar 2022)

This is a lot! To add some extra colour here, a big difference betwen now and pre-2020 is that I’ve been working on OSS exclusively in my personal time (nights and weekends), and I’ve also been slugging away at a single large goal (Hanami 2.0, if you hadn’t heard!), and the combination of this can make the whole thing feel a little thankless. These monthly updates are timely punctuation and a valuable reminder that I am moving forward.

They also capture a lot of in-the-moment thinking that’d otherwise be lost to the sands of time. What I’ve grown to realise with my OSS work is that it’s as much about the process as anything else. For community-driven projects like dry-rb and Hanami, the work will be done when it’s done, and there’s not particularly much we can do to hurry it. However, what we should never forget is to make that work-in-progress readily accessible to our community, to bring people along for the ride, and to share whatever lessons we discover along the way. The passing of each month is a wonderful opportunity for me to do this 😀

Finally, a huge thank you from me to anyone who reads these updates. Hearing from folks and knowing there are people out there following along is a huge encouragement to me.

So, let’s keep this going. I’m looking forward to another year of updates, and—checks calendar–writing April’s post in the next week or so!

,

Tim RileySalubrious Ruby: Don’t mutate what you don’t own

When we’re writing a method in Ruby and receiving objects as arguments, a helpful principle to follow is “don’t mutate what you don’t own.”

Why is this? Those arguments come from places that we as the method authors can’t know, and a well-behaved method shouldn’t alter the external environment unexpectedly.

Consider the following method, which takes an array of numbers and appends a new, incremented number:

def append_number(arr)
  last_number = arr.last || 0
  arr << last_num + 1
end

If we pass in an array, we’ll get a new number appended in the returned array:

my_arr = [1, 2]
my_new_arr = append_number(my_arr) # => [1, 2, 3]

But we’ll also quickly discover that this has been achieved by mutating our original array:

my_arr = [1, 2]
my_new_arr = append_number(arr) # => [1, 2, 3]
my_arr # => [1, 2, 3]

We can confirm by an object equality check that this is still the one same array:

my_new_arr.eql?(my_arr) # => true

This behavior is courtesy of Ruby’s Array#<< method (aka #append or #push), which appends the given object to the receiver (that is, self), before then returning that same self. This kind of self-mutating behaviour is common across both the Array and Hash classes, and while it can provide some conveniences in local use (such as a chain of #<< calls to append multiple items to the same array), it can lead to surprising results when that array or hash comes from anywhere non-local.

Imagine our example above is part of a much larger application. In this case, my_arr will most likely come from somewhere far removed and well outside the purview of append_number or whatever class contains it. As the authors of append_number, we have no idea how that original array might otherwise be used! For this reason, the courteous approach to take is not to mutate the array, but instead create and return a new copy:

def append_number(arr)
  last_number = arr.last || 0

  # There are many ways we can achieve the copy; here's just one
  arr + [last_number]
end

This way, the caller of our method can trust their original values to go unchanged, which is what they would likely expect, especially if our method doesn’t give any hint that it will mutate.

my_arr = [1, 2]
my_new_arr = append_number(arr) # => [1, 2, 3]
my_arr # => [1, 2]

This is a very simple example, but the same princple applies for all kinds of mutable objects passed to your methods.

A more telling story here comes from earlier days of Ruby, around how we handled options hashes passed to methods. We used to do things like this:

def my_method(options = {})
  some_opt = options.delete(:some_opt)
  # Do something with some_opt...
end

my_method(some_opt: "some value")

Using trailing options hashes like this was how we provided “keyword arguments” before Ruby had them as a language feature. Now the trouble with the method above is that we’re mutating that options hash by deleting the :some_opt key. So if the user of our method had code like this:

common_options = {some_opt: "some value"}

first_result = my_method(common_options)
second_result = my_method(common_options)

We’d find ourselves in trouble by the time we call my_method the second time, because at that point the common_options hash will no longer have some_opt:, since the first invocation of my_method deleted it — oops!

This is a great illustration of why modern Ruby’s keyword arguments work the way they do. When we accept a splatted keyword hash argument like **options, Ruby ensures it comes into the method as a new hash, which means that operations like options.delete(:some_opt) do in fact become local in scope, and therefore safe to use.

So now that we’ve covered both arrays and hashes now as Ruby’s most common “container” objects, what about the other kinds of application-specific structures that we might encounter in real world codebases? Objects representing domain models of various kinds, an instance of an ActiveRecord::Base subclass, even? Even in those cases, this principle still holds true. Our code is easier to understand and test when we can reduce the number of dimenstions to its behaviour, and mutating passed-in objects is a big factor in this, especially if you think about methods calling other methods and so on. There are ways we can design our applications to make this a natural approach to take, even for rich domain objects, but that is a topic for another day!

Until then, hopefully this walkthrough here can serve as a reminder to keep our methods courteous, to respect mutable values provided from outside, and wherever possible, leave them undisturbed and unmutated. Salubrious!

Bonus content! In preparing this post, I thought about whether it might be helpful to note that Ruby is a “pass by reference” language, since that’s the key underlying behavior that can result in these accidental mutations. However, my intuition here was actually back to front! Thanks to this wonderful stackoverflow answer, I was reminded that Ruby is in fact a “pass by value” language, but that all values are references.

,

Tim SerongGo With The Flow

We installed 5.94kW of solar PV in late 2017, with an ABB PVI-6000TL-OUTD inverter, and also a nice energy efficient Sanden heat pump hot water service to replace our crusty old conventional electric hot water system. In the four years since then we’ve generated something like 24MWh of electricity, but were actually only able to directly use maybe 45% of that – the rest was exported to the grid.

The plan had always been to get batteries once we are able to afford to do so, and this actually happened in August 2021, when we finally got a single 10kWh Redflow ZCell zinc bromine flow battery installed. We went with Redflow for several reasons:

  • Unlike every other type of battery, they’re not a horrible fire hazard (in fact, the electrolyte, while corrosive, is actually fire retardant – a good thing when you live in a bushfire prone area).
  • They’re modular, so you can just keep adding more of them.
  • 100% depth of discharge (i.e. they’re happy to keep being cycled, and can also be left discharged/idle for extended periods).
  • All the battery components are able to be recycled at end of life.
  • They’re Australian designed and developed, with manufacturing in Thailand.

Our primary reasons for wanting battery storage were to ensure we’re using renewable energy rather than fossil fuels, to try to actually use all our local power generation locally, and to attain some degree of disaster resilience.

Being in Tasmania, most of our grid power is from renewable sources anyway (hydro), so the renewable energy argument may seem a little weak at first, unless you cast your mind back to the time some idiot decided to deplete our dams by selling a whole lot of hydro-generated power to Victoria in an El Niño year, then the Basslink cable broke and Tasmania had to fire up a bunch of diesel generators to get through winter. Good times.

On the local generation and use front, as I mentioned at the start of this post, we’ve previously exported more than half the energy we generated, but the feed-in tariff we get from Aurora (the power company) is only $0.06501 per kWh. Compare that to the rate we pay for grid energy ($0.29852/kWh peak or $0.139/kWh off-peak). Say we exported 13.2MWh in the last four years, we would have received about $858 worth of credit… But then when we drew power back from the grid at night, or on cloudy days, we would have paid somewhere between $1834-$3940 for that same amount of power. Treating the grid as a proxy for battery storage does not make any sort of financial sense.

As for disaster resilience, we don’t often have grid outages, but we do have them, and that can be a problem. Notably, all our potable water comes from rainwater tanks attached to the house and shed, and the pumps that push that water to the taps are electric, so if the grid is down we don’t have running water. Sure, I can get water out of the tanks with a bucket or a jug, and that’s fine for a little drinking or handwashing, but it’s not good long term. Then there’s our fridges and freezer – at any given time we’re likely to have a lot of stored frozen meat from animals we farm. We don’t want to lose that in a potential extended grid outage, as could happen in bushfire season or during a severe weather event. Also, it’s nice to still have power for our local network and NBN kit, so we can check the TasNetworks Power Outages page to find out WTF is going on.

Complete grid independence would be nice, and with our current power utilisation and a single Redflow battery we could almost do it in the height of summer, or even on some good days in spring or autumn, where we can generate all we need to run the house during the day and charge the battery to 100%, then draw it down overnight. The kink is that Redflow batteries need to undergo a maintenance cycle at least every three days where they are completely discharged for a few hours. If you’re grid connected you don’t really notice this, because the maintenance cycle commences at sunset and once the battery is drained you’re just using grid power again until the sun comes up, but it does mean we can’t be grid independent even if we theoretically have enough PV generation to do so, until we get a second battery (with more than one ZCell, the maintenance cycles are interleaved so at least one battery will always have some power in it).

The other problem with grid independence is that as much as Tasmania is excellent for solar PV generation in summer, it sucks in winter. Looking at our generation and usage figures for 2019, from mid-May to mid-August we were only able to generate 17% of the power we used, and I’ve seen days where we only generated 1-2kW in the entire day. Compare that with summer when we’ve peaked above 40kW some days in December.

Still, if the grid went away for a long time in the warmer half of the year with our current setup, it’d be irritating every few nights, but I reckon we’d manage OK. Of course, there would be some adjustments required to minimise our utilisation: I’d set the blockout timer on the Sanden so the hot water only heated during daylight hours, I’d turn most of our computer equipment off overnight, we’d try to avoid using the microwave at the same time as any other chunky electrical appliance so as not to pull more than 3kW continuously from the ZCell, there’s some lights we usually leave on that we’d just turn off, and so forth. In the colder half of the year, well, I guess we’d try to eat all the frozen food quickly then limp along as best as possible. We would still have some power, some of the time.

When we originally had the PV installed, it was AC coupled, i.e. the solar panels were connected to the inverter, and the inverter was connected to the grid, along with our loads. Our choice of inverter (ABB) was partly because it was Selectronic certified, and at the time we knew Redflow to be working with Selectronic battery inverters. Once we finally got to contacting Murray Roberts from Lifestyle Electrical Services in order to get a quote and talk about installation, circumstances had changed. It turns out that the Selectronic kit just doesn’t like flow batteries with their need to be completely discharged periodically. Victron Energy gear on the other hand works really well with – and is fully supported for use with – Redflow’s ZCell batteries. Lesson learned: with changing technology it doesn’t always pay to plan too far in advance.

Murray initially proposed a setup which would have hooked straightforwardly into our AC coupled solar, i.e. the ABB inverter and PV cells would remain connected as is, then there’d be a Victron energy meter to measure what was coming from the PV and from the grid, a Victron MultiPlus II inverter/charger to charge the battery and pull from the battery to run some loads, plus a Victron Cerbo GX and GX Touch to provide monitoring and control. Some of our power circuits to the house would be hooked up as Essential House Loads (i.e. to be supported by the battery during a grid outage), and some would be Non-Essential House Loads, i.e. powered by the grid and/or PV, but without battery backup. The Victron Cerbo GX and the Redflow ZCell Battery Management System (BMS) are internet connected to hook up with Victron’s VRM portal, which provides handy monitoring tools and graphs, and to allow remote support and assistance and firmware updates. The initial proposal looked something like this:

AC Coupled Solar schematic. Some detail missing (e.g. isolators) but you get the idea.

That configuration would have involved the least messing around, and met our goals of:

  • Utilising as much of your own energy as possible locally.
  • Dealing with occasional/unexpected grid outages (modulo the ZCell maintenance cycle).

But, it still fundamentally relied on the grid. With AC coupled solar, if the grid goes down your inverter automatically goes into anti-islanding mode, and won’t give you any power from your PV array even if the grid is down during the day and the sun is shining on your panels. Your first thought here may be “wait, if the grid is down but I still have sunlight, surely I should still have power”, and that’s an understandable reaction, but anti-islanding is actually a safety feature. If the juice goes from the PV cells to the inverter to the grid and your loads, and you’re exporting power while a power company employee is doing maintenance works on the grid side, you could electrocute them. This would not be a good thing.

A perhaps less obvious problem with this setup is that you can’t black start the site during an extended grid outage. If the grid is down and the inverter is thus in anti-islanding mode, you have no way to get power to restart the system and recharge the battery from empty (unless you’ve also got a generator). “But the grid never goes down for that long…” you might say, until you look at the outages really severe bushfires can cause (think: East Gippsland in the 2019-2020 Black Summer bushfires).

After some further discussion, Murray proposed getting rid of the ABB inverter, and doing DC coupled solar instead, with a Victron MPPT RS hooked up to the PV, two MultiPlus IIs, so that we could handle up to 10kW of power (that’s the maximum limit of all loads and grid export), with all house loads hooked up as Essential, so they can be supplied by whatever combination of grid, solar and battery power is available at any given time. Voilà:

DC Coupled Solar schematic. Again, some detail missing, but mostly accurate.

One thing that’s missing in the above diagram is a manual changeover switch we had installed later, so that if there’s ever a fault in the Victron cluster that requires major works, but the grid is still up, we can manually switch all our loads back over to the grid side for the duration. Not that I expect to need that functionality, but better to have it than not just in case.

The 10kW maximum has proven to be fine, by the way – we just don’t ever put anything like that much power through the system at once. During the day we might pull 400-700W continuously with spikes from 1-3kW or occasionally 4-5kW when multiple heavier loads come on. I think the highest we’ve managed was 7kW very briefly one time with a panel heater and the microwave and the hot water and the clothes dryer and gods only know what else was on at the time, the point is it’s not easy to get the load that high. Note that we currently have a gas stove and oven – if we switched that to electric we might want to be a bit cautious about running lots of other heavy loads while cooking, but I suspect we’d still be fine in general.

Rhys, one of Murray’s crew, did a fantastic job of installing all the kit over several days, then Murray came out to do the final commissioning and bring the system online on August 31, 2021. Here’s a couple of pictures:

RedFlow ZCell battery, 2x Victron MultiPlus IIs, Victron MPPT RS
MPPT again, with ethernet switch, sub board, Victron Cerbo GX and ZCell BMS off to the right.

The transparent box in the above photo contains the Cerbo GX and the ZCell BMS, along with a little 12V backup power supply so that those things keep running if the grid fails and the ZCell is at 0% State of Charge (SoC). The ethernet switch above the sub board hooks up to the network point I installed previously when I was using a Raspberry Pi to monitor the ABB inverter’s generation. It’s Power-over-Ethernet, run from a UPS in my office which also keeps the NBN box and router alive, so the whole system still has internet access for about an hour even if all other power sources are dead (handy if there were some sort of fault and we needed remote assistance).

Here’s what the main switchboard looks like now. The leftmost switch (“Main switch (grid supply)”) turns grid power on/off to the sub board under the house, which goes from there to the Multis’ AC input. The AC out comes back up here to the right-hand switch (“Victron Sub Board Backup Supply”) and thence to all the loads (the various switches in the middle).

Do not fuck with the electricity.

Inside the house there’s a neat little touchscreen console (the Victron GX Touch), which connects via an HDMI cable in the wall to the Cerbo under the house. This shows you what everything is doing at any given time, provides notifications of alarms (e.g.: “Grid Lost”) and has a series of menus for configuring the system. The exact same console is also accessible via a web browser or mobile phone, either over the local network, or remotely via the VRM portal.

Three days after everything was up and running, we went out to run some errands and came back home in the afternoon to discover the Cable PI device in our kitchen beeping like mad. All the power was still on in the house. I called Murray in a bit of a panic thinking something was broken, but it turned out we were experiencing an actual grid outage over a large area – all the way from Grove to Leslie Vale, 802 properties were without power due to wires down in strong wind, and the PV was powering our house. We didn’t feel a thing. The UPS in my office noticed a small dip for a second or two when the grid failed, but the microwave clock was still on, and other computer gear not hooked up to a UPS remained up during the cutover. And the battery kept on charging – it was at 65% SoC when the grid went out and up to 83% by the time the grid came back.

This was just awesome.

A few days after that, suddenly, at 19:39 on September 6, we were without power. The grid was up, but the Victron kit had shut down. This was not awesome. It happened during the ZCell’s regular maintenance cycle, and at the time we got a warning in the BMS logs, and a battery high voltage warning from the MPPT. It was unclear then exactly what the problem was though – our MPPT RS was a newer model so maybe it was different somehow from what had been previously tested? Also, we discovered the DVCC setting still needed to be turned on, so maybe that was the issue. Anyway, I power cycled the Victron kit and everything was fine again until a couple of weeks later on September 19, when the system shut down again during a maintenance cycle, and again coinciding with battery high voltage warnings.

Because of the previous shutdown, Murray had been in contact with Simon Hackett from Redflow, and Simon subsequently enabled the “DC-coupled PV – feed in excess” setting. The assumption here was that the extra power being delivered from the ZCell during the maintenance discharge wasn’t absorbed by our house loads, i.e. it was trying to discharge at 1kW, but our loads were utilising less than that, and there was nowhere for the excess to go, hence the shutdown. Enabling “DC-coupled PV – feed in excess” allows power from the DC bus to be sent to the grid if necessary, which turns out to be the case at our site. A second ZCell would mitigate this because the one under maintenance would simply be able to dump to the other (assuming it wasn’t already full), but we only have one battery so far.

At this point, after those two final settings changes (and a firmware update) the system was operating exactly as it should. Everything was configured correctly, and we could survive grid outages if there was charge in the battery and/or the sun was up. Our electricity meter was replaced on September 21 so we could switch to Tariff 93 Peak & Off-Peak billing. There were no further unexpected shutdowns. Everything was totally fine, except we were still seeing these weird battery high voltage warnings during the ZCell maintenance cycle, reported by both the BMS and the MPPT.

MPPT Alarm #2 Battery high voltage

This is something neither Murray nor Simon had seen before. What followed was several weeks of troubleshooting and analysis, which I found absolutely fascinating.

I was keeping detailed notes of what happened during each maintenance cycle, and what I saw in the BMS logs, and on the Cerbo console. Several maintenance cycles later I discovered a correlation between the battery high voltage warnings, and sudden large changes in AC loads, notably when a 2400W panel heater in our bedroom turned itself on and off overnight. Also, the high voltage warnings seemed to be more likely to occur if we went into maintenance with a high state of charge, versus a low state of charge.

Overnight load spikes

Simon, who knows and understands how the ZCell behaves during maintenance, explained about the Energy Extraction Device, which is part of the unit whose purpose is to deliberately drain the battery down to zero for maintenance in a timely fashion. He wondered if there was some issue where very high power demands at short notice, while the the EED was active, were causing the DC bus voltage to fluctuate and in turn cause the MPPT to respond in an unusual manner. We experimented with changing settings on the BMS to activate the EED later than usual in the maintenance cycle (“Start maintenance when SoC below x%”), and also experimented with limiting inverter output on the Multis, and tweaking the maximum charge voltage.

After a few more cycles of observation, the suspicion became that the MPPT itself wasn’t at fault, rather it was just being the messenger. Maybe these odd voltage spikes had always happened at other sites too, but the new MPPT RS units were doing a better job of noticing them?

Later in October, Simon noticed that we were seeing spikes very close to when the battery was nearly completely empty. At that point in time, the EED was telling the Multis that it had a 10 amp output capacity, but if the Multis tried to draw on that to handle a sudden rise in load (as from our panel heater for example), the battery voltage would collapse, and the system would oscillate a bit between those two states. That behaviour was fixed with a small firmware change which I believe later landed in the BMS 1.1.11 release. Unfortunately, that change by itself didn’t make the high voltage warnings go away.

A few days after that, we suddenly had a slew of #201 – Internal DC voltage errors from the MPPT, so we were back to being concerned that maybe there was something wrong with that piece of equipment, especially given those errors began to crop up more often as time passed.

MPPT Alarm #201 – Internal DC voltage error.

Victron’s documentation ominously stated that this error meant “a measurement circuit inside the unit is broken” and the unit “is really broken, not safe for use, and if it hadn’t stopped working already then it would have stopped working soon”. Clearly a replacement or repair was in order, but I’ll get to that later.

On November 4, looking at the logs I’d collected from November 2, Simon noticed that one of the high battery voltage warnings happened at quite a high state of charge (72%), which meant it wasn’t really about the battery running out of energy and having the voltage collapse. It looked like it was the EED being over-drawn, regardless of how much energy was still in the battery. It turns out there’s a thing that the ZCell does to handle surge demand when the EED is on, called an “EED switchback”. ZCells internally have three contactors, for Charge, Discharge and EED (also known as Strip). In normal operation, the C and D contactors are on, and E is off, so the battery can be charged or discharged at will, and the EED is doing nothing. During maintenance, the EED comes on, but it can’t deliver more than 20 amps. If the site pulls more than the EED can supply, the battery goes back to normal operation (C and D on, E off) while the high demand is present. Once the high demand goes away, it switches the EED back on, to keep discharging at the normal rate of 1kW. But, by default, that switchback process only happens five times per battery maintenance cycle so as to avoid the potential for excessive cycling of the contactors in weird edge cases.

Looking at our overnight load with the panel heater on, there’s loads of spikes from below 1kW to above 1kW, so we were getting through that switchback limit very quickly. After that, with the high load from the panel heater, it was entirely possible that the Multi cluster would try to pull more power than the EED could provide, and the EED would shut down in response, resulting in weirdness on the DC bus. Simon’s suspicion for why this hadn’t been seen before was that it needed at least four casual factors to be present at once:

  1. Site demand is spiky for the entire night.
  2. The spikes start below 1kW and end above 1kW to use up the switchback quota.
  3. The site has only one battery (if a second battery were present it would handle the surge load while the first was in maintenance).
  4. The two Multis are capable of running far more load than the battery can service.

If we were to change or remove any one of those factors, we wouldn’t see the problem. So, as a test, Simon changed the switchback limit from 5 to 50, and I watched what happened during the next maintenance cycle. The status page of the BMS web interface shows, among other things, the current state of the contactors. Here’s an example with C and D on, and E off:

The ZBM logs also show the contactor state over time. Here’s a snippet I’ve colourised to make the state changes obvious:

If we take the above section of ZBM logs from 2021-11-08 23:15 to 23:29, it looks like we had “_ D E” up until 23:17, then switched to “C D _” from 23:18 to 23:26, then finally to “_ _ E” at 23:27. Based on this I’d imagine we had one switchback event that lasted eight minutes. But I had earlier noticed on the status page that the contactors seemed to be toggling more rapidly, so I wrote a little script to scrape the BMS REST API once per second and dump that to a file, which shows Charge and EED toggling on/off about ten times in that window:

2021-11-08T23:14:28+11:00    "_ D E"
2021-11-08T23:17:59+11:00    "C D _"
2021-11-08T23:18:29+11:00    "C D E"
2021-11-08T23:18:34+11:00    "_ D E"
2021-11-08T23:18:56+11:00    "C D _"
2021-11-08T23:19:14+11:00    "C D E"
2021-11-08T23:19:20+11:00    "_ D E"
2021-11-08T23:19:54+11:00    "C D _"
2021-11-08T23:20:17+11:00    "C D E"
2021-11-08T23:20:18+11:00    "_ D E"
2021-11-08T23:20:49+11:00    "C D _"
2021-11-08T23:21:19+11:00    "C D E"
2021-11-08T23:21:22+11:00    "_ D E"
2021-11-08T23:21:46+11:00    "C D _"
2021-11-08T23:21:51+11:00    "_ D _"
2021-11-08T23:21:54+11:00    "C D _"
2021-11-08T23:22:23+11:00    "C D E"
2021-11-08T23:22:26+11:00    "_ D E"
2021-11-08T23:22:43+11:00    "C D E"
2021-11-08T23:22:45+11:00    "C D _"
2021-11-08T23:23:26+11:00    "C D E"
2021-11-08T23:23:30+11:00    "_ D E"
2021-11-08T23:23:43+11:00    "C D _"
2021-11-08T23:24:30+11:00    "C D E"
2021-11-08T23:24:34+11:00    "_ D E"
2021-11-08T23:24:43+11:00    "C D _"
2021-11-08T23:25:35+11:00    "C D E"
2021-11-08T23:25:39+11:00    "_ D E"
2021-11-08T23:25:49+11:00    "C D _"
2021-11-08T23:26:40+11:00    "C D E"
2021-11-08T23:26:44+11:00    "_ D E"
2021-11-08T23:26:49+11:00    "_ _ E"

On the assumption we were hitting way more switchbacks than expected, Simon just went and set the maximum switchback count to 999. A few days later, taking the log from my script, I saw something like 150-160 switchbacks, but given we’d set the limit way high, that fixed all the high voltage warnings, except for one, right at the very end of the maintenance cycle when the discharge limit from the ZCell drops from 10 amps to 0 amps.

Simon discussed this final spike with the folks who built the EED, and found that when current draw from the EED stops, there is a voltage spike, of very low energy, for 10ms, and it can rise as high as 64V during that period before dropping back to the expected 57V. It’s normal for the EED to do this, and as there’s no real energy in it, it won’t be damaging anything. The thing about our site seems to be the new MPPT RS, with a new voltage sensing circuit that’s actually capable of noticing the spike, whereas the other gear (the MultiPlus IIs) misses it because it’s so short. The advice from the electrical engineer was to try adding more capacitors on the DC bus to absorb the spike. We already had two 47,000uF capacitors on there, so Murray went and ordered two more.

With the high battery voltage warnings out of the way, we were back to the #201 Internal DC voltage errors from the MPPT. On the assumption the unit was indeed faulty in that regard we requested a replacement, but Victron came back and said the problem could be fixed by replacing two resistors on the main board of the unit. I guess that makes sense – if you can fix a problem with $2 worth of resistors, that’s three orders of magnitude cheaper than replacing the whole unit.

By then we were getting into December, and what with pandemic-related shipping delays and the Christmas holiday period, it was later in January before we were able to get the additional capacitors installed on the DC bus, and replace the resistors in the MPPT’s voltage sensing circuit. The additional capacitors went just fine, the replacement resistors not so much. Once the MPPT was powered back up it claimed there was zero volts coming from the PV, even though the sun was shining, and the LCD display started flickering strangely. Something was definitely broken here, so we powered it back down, and Simon arranged for a replacement unit to be sent out, which took another few weeks, which is a damn shame in January/February, being prime solar PV generation time.

The delay did however allow me to spend some time messing around with scheduled charges to see if there was a cost benefit to grid-charging the battery during off-peak times, then drawing it back down during peak, because the reality is we’re going to want to do this in winter when there’s not much sun, so why not try it out in advance? TL;DR: Yes, it’s worth grid charging the battery off-peak, provided you use all that power during peak times, but it’s a bit irritating trying to figure out exactly what you’ll save. In one of my tests it was the difference between paying $3.85 for about 20kWh of usable electricity in a 24 hour period versus paying $4.70, so it’s not insignificant.

Rhys came out and installed the replacement MPPT on February 11, and was done by the middle of the day. Everything was running beautifully again, but when the unit came online there were ten instances of the dreaded #201 Internal DC Voltage Error, along with a #27 Charger Short Circuit. I used the VictronConnect app on my phone to see if I could get any more information directly from the MPPT. It told me there was a firmware update available from v1.05 to v1.08, so I went looking for information about that, and discovered that Victron’s error code documentation had been updated since I first saw it back in late October. In addition to the ominous warnings about broken measurement circuits it now also said:

“Make sure to update the firmware to at least v1.08, in previous firmwares the limits were too strict. And it could trigger falsely during MPPT start-up in the morning and MPPT shutdown in the evening.”

So I updated the firmware, and writing this now, two and a half months later, we’ve not seen a single #201 since. Could this have always been a firmware issue? Maybe, given the “accepted answer” on this Victron Community forum post says that firmware version v1.08 “solves the vast majority of MPPT RS, and Inverter RS, Error 201 issues”. Or maybe it was both – maybe we had a broken bit of kit and broken firmware too. Either way, it’s fixed now.

I continued to monitor regular maintenance cycles, and also deliberately forced maintenance a couple of times with a high state of charge to try to stress it as much as I could. During those periods I saw something like 3-10 switchbacks, so Simon set our switchback limit back down from 999 to 30. I understand a future Redflow firmware update will change this default for everyone to somewhere between 25-50, and I’m very happy that this unexpected testing at our site resulted in firmware improvements that will presumably benefit other ZCell users too.

By late April we’d been through 33 maintenance cycles since the extra capacitors went in, with 26 of those occurring since the new MPPT was installed. There had been only three occasions when the BMS briefly noticed alleged high battery voltages in that time. The MPPT was completely silent until April 20 when we got three battery high voltage warnings within an 8 minute interval right at the end of the maintenance cycle, when the battery was almost completely empty. But the weather had also started to get cold, and those errors coincide with a spike from our panel heater, which is consistent with our earlier observations about load spikes with the EED on being “difficult” and really just points to replacing the panel heater with a heat pump. Heat pump are way more energy efficient, have much smoother load, and can also be used for cooling in summer (they’re called “reverse cycle air conditioners” on the mainland).

That’s about the end of the story. The system is brilliant, and we could not be happier with the support we’ve received from Simon at Redflow, who’s been extremely generous with his time and knowledge, and Murray and Rhys of Lifestyle Electrical Services. Thanks for everything guys, I’ve learned a lot. In the eight months the system has been running we’ve generated 4631kWh of electricity and “only” sent 588kWh to the grid, which means we’ve used 87% of what we generated locally – much better than the pre-battery figure of 45%. I suspect we’ve reduced the amount of power we pull from the grid by about 30% too, but I’ll have to wait until we have a full year’s worth of data to be sure. We’ve also survived or shortened at least five grid outages with durations from a few minutes to a few hours.

The next thing to do is get a second ZCell, and possibly eventually think about a third. Given our current generation capability, two ZCells would allow us to store and utilise 100% of our generated power locally. We’d also have the ability to handle grid outages at any time, because with two batteries the maintenance cycles interleave and they can be configured to always ensure there’s a minimum amount of charge somewhere. A third would allow us to look at Standby Power System (SPS) mode where one battery is fully charged, then put into hibernation where it can remain for months. This sounds like a great way to have backup storage available for grid outages in the middle of winter when there’s no sunlight.

Appendix A – Settings Worth Messing With

Scheduled charging on the Cerbo GX console

In summer I scheduled charges of the battery to 15% between the hours of 04:00 and 08:00, and 30% between the hours of 15:00-17:00. Peak electricity hours during daylight savings are 08:00-11:00 and 17:00-22:00, and I found that a 15% charge overnight from the grid was enough to have a bit in the battery for the morning peak before the PV really got going. The afternoon charge was there mostly just in case we had a cloudy day – we’d usually get way more charge than that from the sun anyway. Now that we’re off daylight savings, peak hours change to 07:00-10:00 and 16:00-21:00, so I’ve set it to a 30% charge from 03:00-07:00 and a 50% charge from 14:00-16:00, which seems to be about right given our general peak utilisation and decreasing sunlight. I’m unlikely to set the afternoon charge higher than 50% because I don’t want to potentially go into a maintenance cycle with the battery very full, but I may re-evaluate that as we get deeper into winter.

It’s worth mentioning that during a scheduled charge, the power will come from wherever the Victron gear can find it, so if the sun is shining, you’ll be charging from the PV, not the grid. One thing to note is that during the scheduled charge period, the battery will not be used to support loads, even if it’s currently got a higher SoC than your limit. Some power will trickle away slowly though, I assume to run the pumps and supporting electronics of the battery itself.

My choice of timing for the overnight charge (four hours up to the start of the morning peak) is me wanting to have some power in the battery for as long as possible overnight in case of outages, without potentially interfering with maintenance cycles, which typically will have finished some time in the wee hours.

I also set up a 15% charge on weekend mornings. This doesn’t save us any money at all (actually it’ll be costing us a couple of tens of cents) because weekends are all off-peak power. The reason is again to have some opportunistic grid backup. Before I set this up we had an outage at 07:45 one Saturday morning with the battery empty, and had to wait until about 09:30 for the PV to bring the battery up to 10% again before everything came back online. Still, that then got us through the remainder of the grid outage which finished at about 10:15.

Battery Maintenance Settings on the ZCELL BMS

On the Battery Maintenance screen of the ZCell BMS, I’ve got “Immediate maintenance for batteries with an EED” turned off, and “Start Maintenance When SoC Below 25%” enabled. This is to try to reduce the amount of time the EED runs, to limit switchbacks caused by our spiky load. In summer I also set “Daily SoC Limit Before Maintenance” to 50%, so the battery would not let itself be charged more than half way on those long hot days with late sunsets and early sunrises. This was to minimise maintenance cycle time, because I’d previously seen occasions where we went into maintenance with 100% SoC, and the cycle didn’t finish before the following morning when the sun came up. I also had a couple of times where I guess some timeout expired and the ZCell went into its final chemical maintenance state while it still had a few percent of charge. Not letting it get very full on maintenance days avoids these situations. Now that we’re getting towards winter I’ve removed that limit because the nights are longer and I expect our evening power utilisation to be higher, i.e. we should naturally use up whatever power we’re able to generate in plenty of time during winter maintenance cycles.

It’s also worth checking the latitude and longitude are set correctly on the Site Configuration page, because that’s how the BMS figures out when the sun sets and thus when to start maintenance by default.

Appendix B – VRM Portal

The VRM portal is a remote monitoring and management web interface which Victron provides gratis for users of their hardware. It provides a realtime view of the same live utilisation information you can get from the Cerbo console, plus handy graphs of solar, grid and battery consumption.

Consumption 2022-04-26. Red is grid, yellow is solar, blue is battery.

It also provides detailed graphs of just about anything you can think of from any of the system components. It’s extremely useful. Without this I never would have been able to correlate the battery high voltage warnings with load spikes and changes in the ZCell discharge current limits.

Viewing a bunch of interesting detail all at once

The data for the advanced graphs is stored for at least six months, and the solar yield and consumption data is stored for at least 5 years. The alarm logs don’t hang around that long – I suspect it may just be showing the last 1000 entries. Somewhat irritatingly, most of these are usually low battery alarms that we don’t care about (you see a lot of them during maintenance cycles).

Appendix C – Security / Connectivity / Internet Access

The ZCell BMS and Victron Cerbo GX both need to be connected to the internet for firmware updates, remote support, and to work with the VRM portal. They don’t need to be connected 100% of the time, but they do want the connection for those reasons. The system will operate just fine if the internet is down though, and you don’t have to use the VRM portal if you don’t want to. I’ve put everything on a separate network, so I can access the BMS and the Cerbo console from my desktop/laptop/phone, but the BMS and Cerbo can’t do the reverse. It’s not that I don’t trust Redflow or Victron, it’s just sensible to keep systems that allow any form of remote access isolated from the rest of your internal network.

The BMS and Cerbo both provide WiFi APs for initial configuration. I’ve since turned those off. I can use the wired connection to the BMS to turn the WiFi back on if I ever need it, and I can do the same for the Cerbo from its console.

The Cerbo and MPPT both speak Bluetooth, so you can use the VictronConnect app to talk to them from your phone, to view status and update firmware.

Appendix D – Hackability

The ZCell BMS has a REST API, which is documented in the online help available from its web interface. This is how I was able to write a few scripts to log the battery state of charge, the contactor state, and the voltage and warning indicator status:

A bunch of Victron stuff is open source, notably Venus OS, which is the software that runs on the Cerbo. It looks fairly straightforward to get root on these things. It’s also possible to hook up the Victron kit to Home Assistant. I haven’t tried actually doing any of these things yet myself.

Appendix E – Aurora Plus

Having switched to Tariff 93 and gotten a fancy new electricity meter, we were able to use the Aurora Plus service from the power company. This provides a web interface and mobile phone app for viewing your power usage down to the hour, colour coded to indicate peak and off-peak usage and solar feed in. You also get a monthly, rather than quarterly bill. This all sounded pretty neat, so I signed up.

Aside from having used it to confirm that the figures I get from the VRM Portal and the power company actually match, it’s turned out to not be especially great.

While the electricity meter records usage information every 15 minutes, it’s only sent back to Aurora once per day, so the usage data is never actually live. Sure, you can see history, but this is useless for adjusting your power consumption on the fly. Compare that with the VRM Portal or Cerbo console, where I can see at a glance how much power is being used and how much solar is being generated right now and decide to turn appliances on or off appropriately.

Also, it nags you to give them money. It’s continually telling me I have a big red negative dollar balance, and periodically notifies me to “top up now to get ahead of your monthly bill”. No. I will pay the bill by the due date listed on the bill, after the bill actually arrives.

Finally, it costs eleven cents a day for the privilege of having the service. Under the circumstances I think I’m going to cancel it and just go back to quarterly billing.

Aurora+, trying to trick me into paying in advance.

Update 2022-06-17: I never got around to cancelling (you can’t do it online – you actually have to call and speak to someone), but I just received a notification saying “From 1 July 2022 you will no longer be charged 11c/day for aurora+”, so I’ve decided to keep it.

,

BlueHackersFree psychologist service at conferences: April 2022 update

We’ve done this a number of times over the last decade, from OSDC to LCA. The idea is to provide a free psychologist or counsellor at an in-person conference. Attendees can do an anonymous booking by taking a stickynote (with the timeslot) from a signup sheet, and thus get a free appointment.

Many people find it difficult taking the first (very important) step towards getting professional help, and we’ve received good feedback that this approach indeed assists.

So far we’ve always focused on open source conferences. Now we’re moving into information security! First BrisSEC 2022 (Friday 29 April at the Hilton in Brisbane, QLD) and then AusCERT 2022 (10-13 May at the Star Hotel, Gold Coast QLD). The awesome and geek friendly Dr Carla Rogers will be at both events.

How does this get funded? Well, we’ve crowdfunded some, nudged sponsors, most mostly it gets picked up by the conference organisers (aka indirectly by the sponsors, mostly).

If you’re a conference organiser, or would like a particular upcoming conference to offer this service, do drop us a line and we’re happy to chase it up for you and help the organisers to make it happen. We know how to run that now.

In-person is best. But for virtual conferences, sure contact us as well.

The post Free psychologist service at conferences: April 2022 update first appeared on BlueHackers.org.

,

FLOSS Down Under - online free software meetingsApril Hack Day Report

The hack day didn’t go as well as I hoped, but didn’t go too badly. There was smaller attendance than hoped and the discussion was mostly about things other than FLOSS. But everyone who attended had fun and learned interesting things so generally I think it counts as a success. There was discussion on topics including military hardware, viruses (particularly Covid), rocketry, and literature. During the discussion one error in a Wikipedia page was discussed and hopefully we can get that fixed.

I think that everyone who attended will be interested in more such meetings. Overall I think this is a reasonable start to the Hack Day meetings, when I previously ran such meetings they often ended up being more social events than serious hacking events and that’s OK too.

One conclusion that we came to regarding meetings is that they should always be well announced in email and that the iCal file isn’t useful for everyone. Discussion continues on the best methods of announcing meetings but I anticipate that better email will get more attendance.

,

Tim RileyOpen source status update, March 2022

My OSS work in March was a bit of a grind, but I made progress nonetheless. I worked mostly on relocating and refactoring the Hanami action and view integration code.

For some context, it was back in May 2020 that I first write the action/view integration code for Hanami 2.0. Back then, there were a couple of key motivators:

  • Reduce boilerplate to an absolute minimum, to the extent that simply inheriting from Hanami::View within a slice would give you a view class fully integrated with the Hanami application.
  • Locate the integration code in the non-core gems themselves (i.e. in the hanami-controller and hanami-view gems, rather than hanami), to help set an example for how alternative implementations may also integrate with the framework.

Since then, we’ve learnt a few things:

  • As we’ve gone about refining the core framework, we’ve wound up having to synchronize changes from time to time across the hanami, hanami-controller, and hanami-view gems all at once.
  • Other Hanami contributors have noted that the original integration approach was a little too “magical,” and didn’t allow users any path to opt out of the integration code.

Once I finished my work on the concrete slice classes last month, I decided that now was the time to address these concerns, to bring the action and view class integrations back into the hanami gem, and to take a different approach to activating the integration code.

The work in progress is over in this PR, and thankfully, it’s nearly done!

The impact within Hanami 2 applications will be fairly minimal: the biggest change is that your base action and view classes will now inherit from application variants:

# slices/main/lib/action/base.rb

require "hanami/application/action"

module Main
  module Action
    class Base < Hanami::Application::Action
      # Previously, this inherited from Hanami::Action
    end
  end
end

By using this explicit application superclass for actions and views, we hopefully make it easier for our users to understand and distinguish between the integrated and standalone variants of these classes. This distinct superclass should also provide us a clear place to hang extra API documentation relating to the integrated behavior of actions and views.

More importantly for the overall experience, Hanami::Application::Action and Hanami::Application::View are both now kept within the core hanami gem. While the framework heads into this final stretch of work before 2.0 final, this will allow us to keep together the aspects of the integration that tend to change together, giving us our best chance at providing a tested, reliable, streamlined actions and views experience.

This is a pragmatic move above all else — we’re a team with little time, so the more we can do to give ourselves confidence in this integrated experience working properly, like having all the code and tests together in one place, the quicker we should be able to get to the 2.0 release. Longer term, we’ll want to provide a first-class integration story for third party components, and I believe we can lead the way in how we deliver that via our actions and views, but that’s now firmly a post-2.0 concern in my mind.

In the meantime, I did take this opportunity to rethink and provide some better hooks for classes like Hanami::Application::View to integrate with the rest of the framework, chiefly via a new Hanami::SliceConfigurable module. You can see how it works by checking out the code for Hanami::Application::View itself:

# frozen_string_literal: true

require "hanami/view"
require_relative "../slice_configurable"
require_relative "view/slice_configured_view"

module Hanami
  class Application
    class View < Hanami::View
      extend Hanami::SliceConfigurable

      def self.configure_for_slice(slice)
        extend SliceConfiguredView.new(slice)
      end
    end
  end
end

Any class that extends Hanami::SliceConfigurable will have its own .configure_for_slice(slice) method called whenever it is sublcassed within a module namespace that happens to match the namespace managed by an Hanami slice. Using the slice object passed to that hook, that class can then read any slice- or application-level config to set itself up to integrate with the application.

In the example above, we extend a slice-specific instance of SliceConfiguredView, which will copy across application level view configs, as well configure the view’s part namespaces to match the slice’s namespace. The reason we build a module instance here (this module builder pattern is a whole technique that I’ll gladly go into one day, but it’s a little out of scope for these monthly updates) is so that we don’t have to keep any trace of the slice as state on the class after we’re done using it for configuration, making it so the resulting class is as standalone as possible, and not offering any way for its users to inadvertently couple themselves to the whole slice instance.

Overall, this change is feeling quite settled now. All the code has been moved in and refactored, and all that’s left is a final polishing pass before merge, which I hope I can get done this week! A huge thank you to Sean Collins for his original work in proposing an adjustment to our action integration code. It was Sean’s feedback and exploratory work here that got me off the fence here, and made it so easy to get started with these changes!

That’s it for me for now. See you all again next month, hopefully with some more continued core framework polishing.

,

Ian BrownIntroduction to GCVE

What is GCVE? Google Cloud VMware Engine, or GCVE, is a fully managed VMware hypervisor and associated management and networking components, (vSphere, NSX-T, vSAN and HCX) built on top of Google’s highly performant and scalable infrastructure with fully redundant and dedicated 100Gbps networking that provides 99.99% availability. The solution is integrated into Google Cloud Platform, so businesses benefit from having full access to GCP services, native VPC networking, Cloud VPN or Interconnect as well as all the normal security features you expect from GCP.

,

FLOSS Down Under - online free software meetingsMarch 2022 Meeting

Meeting Report

The March 2022 meeting went reasonably well. Everyone seemed to have fun and learn useful things about computers. After 2 hours my Internet connection dropped out which stopped the people who were using VMs from doing the tutorial. Fortunately most people seemed ready for a break so we ended the meeting. The early and abrupt ending of the meeting was a disappointment but it wasn’t too bad, the meeting would probably only have gone for another half hour otherwise.

The BigBlueButton system was shown to be effective for training when one person got confused with the Debian package configuration options for Postfix and they were able to share the window with everyone else to get advice. I was also confused by that stage.

Future Meetings

The main feature of the meeting was training in setting up a mailserver with Postfix, here are the lecture notes for it [1]. The consensus at the end of the meeting was that people wanted more of that for the April meeting. So for the April meeting I will add to the Postfix Training to include SpamAssassin, SPF, DKIM, and DMARC. For the start of the next meeting instead of providing bare Debian installations for the VMs I’ll provide a basic Postfix/Dovecot setup so people can get straight into SpamAssassin etc.

For the May meeting training on SE Linux was requested.

Social Media

Towards the end of the meeting we discussed Matrix and federated social media. LUV has a Matrix server and I can give accounts to anyone who’s involved in FOSS in the Australia and New Zealand area. For Mastodon the NZOSS Mastodon server [2] seems like a good option. I have an account there to try Mastodon, my Mastodon address is @etbe@mastodon.nzoss.nz .

We are going to make Matrix a primary communication method for the Flounder group, the room is #flounder:luv.asn.au . My Matrix address is @etbe:luv.asn.au .

,

FLOSS Down Under - online free software meetingsMailing List

We now have a mailing list see https://lists.linux.org.au/mailman/listinfo/flounder for information, the address to post to the list is flounder@lists.linux.org.au..

We also have a new URL for the blog and events. See the right sidebar for the link to the iCal file which can be connected to Google Calendar and most online calendaring systems.

,

FLOSS Down Under - online free software meetingsFirst Meeting Success

We just had the first Flounder meeting which went well. Had some interesting discussion of storage technology, I learnt a few new things. Some people did the ZFS training and BTRFS training and we had lots of interesting discussion.

Andrew Pam gave a summary of new things in Linux and talked about the sites lwn.net, gamingonlinux.com, and cnx-software.com that he uses to find Linux news. One thing he talked about is the latest developments with SteamDeck which is driving Linux support in Steam games. The site protondb.com tracks Linux support in Steam games.

We had some discussion of BPF, for an introduction to that technology see the BPF lecture from LCA 2022.

Next Meeting

The next meeting (Saturday 5th of March 1PM Melbourne time) will focus on running your own mail server which is always of interest to people who are interested in system administration and which is probably of more interest than usual because of Google forcing companies with “a legacy G Suite subscription” to transition to a more expensive “Business family” offering.

,

Stewart SmithAdventures in the Apple Partition Map (Part 2 of the continuing adventures with the Apple Power Macintosh 7200/120 PC Compatible)

I “recently” wrote about obtaining a new (to me, actually quite old) computer over in The Apple Power Macintosh 7200/120 PC Compatible (Part 1). This post is a bit of a detour, but may help others understand why some images they download from the internet don’t work.

Disk partitioning is (of course) a way to divide up a single disk into multiple volumes (partitions) for different uses. While the idea is similar, computer platforms over the ages have done this in a variety of different ways, with varying formats on disk, and varying limitations. The ones that you’re most likely to be familiar with are the MBR partitioning scheme (from the IBM PC), and the GPT partitioning scheme (common for UEFI systems such as the modern PC and Mac). One you’re less likely to be familiar with is the Apple Partition Map scheme.

The way all IBM PCs and compatibles worked from the introduction of MS-DOS 2.0 in 1983 until some time after 2005 was the Master Boot Record partitioning scheme. It was outrageously simple: of the first 512 byte sector of a disk, the first 446 bytes was for the bootstrapping code (the “boot sector”), the last 2 bytes were for the magic two bytes telling the BIOS this disk was bootable, and the other 64 bytes were four entries of 16 bytes, each describing a disk partition. The Wikipedia page is a good overview of what it all looks like. Since “four partitions should be enough for anybody” wasn’t going to last, DOS 3.2 introduced “extended partitions” which was just using one of those 4 partitions as another similar data structure that could point to more partitions.

In the 1980s (similar to today), the Macintosh was, of course, different. The Apple Partition Map is significantly more flexible than the MBR on PCs. For a start, you could have more than four partitions! You could actually have a lot more than four partitions, as the Apple Partition Map is a single 512-byte sector for each partition, and the partition map is itself a partition. Instead of being block 0 (like the MBR is), it actually starts at block 1, and is contiguous (The Driver Descriptor Record is what’s at block 0). So, once created, it’s hard to extend. Typically it’d be created as 64×512-byte entries, for 32kb… which turns out is actually about enough for anyone.

The Inside Macintosh reference on the SCSI Manager goes through more detail as to these structures. If you’re wondering what language all the coding examples are in, it’s Pascal – which was fairly popular for writing Macintosh applications in back in the day.

But the actual partition map isn’t the “interesting” part of all this (and yes, the quotation marks are significant here), because Macs are pretty darn finicky about what disks to boot off, which gets to be interesting if you’re trying to find a CD-ROM image on the internet from which to boot, and then use to install an Operating System from.

Stewart SmithEvery time I program a Mac…

… the preferred programming language changes.

I never programmed a 1980s Macintosh actually in the 1980s. It was sometime in the early 1990s that I first experienced Microsoft Basic for the Macintosh. I’d previously (unknowingly at the time as it was branded Commodore) experienced Microsoft BASIC on the Commodore 16, Commodore 64, and even the Apple ][, but the Macintosh version was something else. It let you do some pretty neat things such as construct a GUI with largely the same amount of effort as it took to construct a Text based UI on the micros I was familiar with.

Okay, to be fair, I’d also dabbled in Microsoft QBasic that came bundled with MS-DOS of the era, which let you do a whole bunch of graphics – so you could theoretically construct a GUI with it. Something I did attempt to do. Programming on the Mac was so much easier to construct a GUI.

Of course, Microsoft Basic wasn’t the preferred way to program on the Macintosh. At that time it was largely Pascal, with C being something that also existed – but you were going to see Pascal in Inside Macintosh. It was probably somewhat fortuitous that I’d poked at Pascal a bit as something alternate to look at in the high school computing classes. I can only remember using TurboPascal on DOS systems and never actually writing Pascal on the Macintosh.

By the middle part of the 1990s though, I was firmly incompetently writing C on the Mac. No doubt the quality of my code increased after I’d done some university courses actually covering the language rather than the only practical way I had to attempt to write anything useful being looking at Inside Macintosh examples in Pascal and “C for Dummies” which was very not-Macintosh. Writing C on UNIX/Linux was a lot easier – everything was made for it, including Actual Documentation!

Anyway, in the early 2000s I ran MacOS X for a bit on my white iBook G3, and did a (very) small amount of any GUI / Project Builder (the precursor to Xcode) related development – instead largely focusing on command line / X11 things. The latest coolness being to use Objective-C to program applications (unless you were bringing over your Classic MacOS Carbon based application, then you could still write C). Enter some (incompetent) Objective-C coding!

Then Apple went to x86, so the hardware ceased being interesting, and I had no reason to poke at it even as a side effect of having hardware that could run the software stack. Enter a long-ass time of Debian, Ubuntu, and Fedora on laptops.

Come 2022 though, and (for reasons I should really write up), I’m poking at a Mac again and it’s now Swift as the preferred way to write apps. So, I’m (incompetently) hacking away at Swift code. I have to admit, it’s pretty nice. I’ve managed to be somewhat productive in a relative short amount of time, and all the affordances in the language gear towards the kind of safety that is a PITA when coding in C.

So this is my WIP utility to be able to import photos from a Shotwell database into the macOS Photos app:

There’s a lot of rough edges and unknowns left, including how to actually do the import (it looks like there’s going to be Swift code doing AppleScript things as the PhotoKit API is inadequate). But hey, some incompetent hacking in not too much time has a kind-of photo browser thing going on that feels pretty snappy.

,

Robert Collinshyper combinators in Rust

Recently I read Michael Snoyman’s post on combining Axum, Hyper, Tonic and Tower. While his solution worked, it irked me – it seemed like there should be a much tighter solution possible.

I can deep dive into the code in a later post perhaps, but I think there are four points of difference. One, since the post was written Axum has started boxing its routes : so the enum dispatch approach taken, which delivers low overheads actually has no benefits today.

Two, while writing out the entire type by hand has some benefits, async code is much more pithy.

Thirdly, the code in the post is entirely generic, except the routing function itself.

And fourth, the outer Service<AddrStream> is an unnecessary layer to abstract over: given the similar constraints – the inner Service must take Request<..>, it is possible to just not use a couple of helpers and instead work directly with Service<Request...>.

So, onto a pithier version.

First, the app server code itself.

use std::{convert::Infallible, net::SocketAddr};

use axum::routing::get;
use hyper::{server::conn::AddrStream, service::make_service_fn};
use hyper::{Body, Request};
use tonic::async_trait;

use demo::echo_server::{Echo, EchoServer};
use demo::{EchoReply, EchoRequest};

struct MyEcho;

#[async_trait]
impl Echo for MyEcho {
    async fn echo(
        &self,
        request: tonic::Request<EchoRequest>,
    ) -> Result<tonic::Response<EchoReply>, tonic::Status> {
        Ok(tonic::Response::new(EchoReply {
            message: format!("Echoing back: {}", request.get_ref().message),
        }))
    }
}

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 3000));

    let axum_service = axum::Router::new().route("/", get(|| async { "Hello world!" }));

    let grpc_service = tonic::transport::Server::builder()
        .add_service(EchoServer::new(MyEcho))
        .into_service();

    let both_service =
        demo_router::Router::new(axum_service, grpc_service, |req: &Request<Body>| {
            Ok::<bool, Infallible>(
                req.headers().get("content-type").map(|x| x.as_bytes())
                    == Some(b"application/grpc"),
            )
        });

    let make_service = make_service_fn(move |_conn: &AddrStream| {
        let both_service = both_service.clone();
        async { Ok::<_, Infallible>(both_service) }
    });

    let server = hyper::Server::bind(&addr).serve(make_service);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

Note the Router: it takes the two services and Fn to determine which to use on any given request. Then we just drop that composed service into make_service_fn and we’re done.

Next up we have the Router implementation. This is generic across any two Service<Request<...>> types as long as they are both Into<Bytes> for their Data, and Into<Box<dyn Error>> for errors.

use std::{future::Future, pin::Pin, task::Poll};

use http_body::combinators::UnsyncBoxBody;
use hyper::{body::HttpBody, Body, Request, Response};
use tower::Service;

#[derive(Clone)]
pub struct Router<First, Second, F> {
    first: First,
    second: Second,
    discriminator: F,
}

impl<First, Second, F> Router<First, Second, F> {
    pub fn new(first: First, second: Second, discriminator: F) -> Self {
        Self {
            first,
            second,
            discriminator,
        }
    }
}

impl<First, Second, FirstBody, FirstBodyError, SecondBody, SecondBodyError, F, FErr>
    Service<Request<Body>> for BinaryRouter<First, Second, F>
where
    First: Service<Request<Body>, Response = Response<FirstBody>>,
    First::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    First::Future: Send + 'static,
    First::Response: 'static,
    Second: Service<Request<Body>, Response = Response<SecondBody>>,
    Second::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    Second::Future: Send + 'static,
    Second::Response: 'static,
    F: Fn(&Request<Body>) -> Result<bool, FErr>,
    FErr: Into<Box<dyn std::error::Error + Send + Sync>> + Send + 'static,
    FirstBody: HttpBody<Error = FirstBodyError> + Send + 'static,
    FirstBody::Data: Into<bytes::Bytes>,
    FirstBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    SecondBody: HttpBody<Error = SecondBodyError> + Send + 'static,
    SecondBody::Data: Into<bytes::Bytes>,
    SecondBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
{
    type Response = Response<
        UnsyncBoxBody<
            <hyper::Body as HttpBody>::Data,
            Box<dyn std::error::Error + Send + Sync + 'static>,
        >,
    >;
    type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
    type Future =
        Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send + 'static>>;

    fn poll_ready(
        &mut self,
        cx: &mut std::task::Context<'_>,
    ) -> std::task::Poll<Result<(), Self::Error>> {
        match self.first.poll_ready(cx) {
            Poll::Ready(Ok(())) => match self.second.poll_ready(cx) {
                Poll::Ready(Ok(())) => Poll::Ready(Ok(())),
                Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
                Poll::Pending => Poll::Pending,
            },
            Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
            Poll::Pending => Poll::Pending,
        }
    }

    fn call(&mut self, req: Request<Body>) -> Self::Future {
        let discriminant = { (self.discriminator)(&req) };
        let (first, second) = if matches!(discriminant, Ok(false)) {
            (Some(self.first.call(req)), None)
        } else if matches!(discriminant, Ok(true)) {
            (None, Some(self.second.call(req)))
        } else {
            (None, None)
        };
        let f = async {
            Ok(match discriminant.map_err(Into::into)? {
                true => second
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
                false => first
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
            })
        };
        Box::pin(f)
    }
}

Interesting things here – I use boxed_unsync to abstract over the body concrete type, and I implement the future using async code rather than as a separate struct. It becomes much smaller even after a few bits of extra type constraining.

One thing that flummoxed me for a little was the need to capture the future for the underlying response outside of the async block. Failing to do so provokes a 'static requirement which was tricky to debug. Fortunately there is a bug on making this easier to diagnose in rustc already. The underlying problem is that if you create the async block, and then dereference self, the type for impl of .first has to live an arbitrary time. Whereas by capturing the future immediately, only the impl of the future has to live an arbitrary time, and that doesn’t then require changing the signature of the function.

This is almost worth turning into a crate – I couldn’t see an existing one when I looked, though it does end up rather small – < 100 lines. What do you all think?

FLOSS Down Under - online free software meetingsFirst Meeting Agenda

The first meeting will start at 1PM Australian Eastern time (Melbourne/Sydney) which is +1100 on Saturday the 5th of February.

I will start the video chat an hour early in case someone makes a timezone mistake and gets there an hour before it starts. If anyone else joins early we will have random chat until the start time (deliberately avoiding topics worthy of the main meeting). The link http://b.coker.com.au will redirect to the meeting URL on the day.

The first scheduled talk is a summary and discussion of free software related news. Anyone who knows of something new that excites them is welcome to speak about it.

The main event is discussion of storage technology and hands-on training on BTRFS and ZFS for those who are interested. Here are the ZFS training notes and here are the BTRFS training notes. Feel free to do the training exercises on your own VM before the meeting if you wish.

Then discussion of the future of the group and the use of FOSS social media. While social media is never going to be compulsory some people will want to use it to communicate and we could run some servers for software that is considered good (lots of server capacity is available).

Finally we have to plan future meetings and decide on which communication methods are desired.

The BBB instance to be used for the video conference is sponsored by NZOSS and Catalyst Cloud.

,

FLOSS Down Under - online free software meetingsFlounder Overview

Flounder is a new free software users group based in the Australia/NZ area. Flounder stands for FLOSS (Free Libre Open Source Software) down under.

Here is my blog post describing the initial idea, the comment from d3Xt3r suggested the name. Flounder is a group of fish that has species native to Australia and NZ.

The main aim is to provide educational benefits to free software users via an online meeting that can’t be obtained by watching YouTube videos etc in a scope that is larger than one country. When the pandemic ends we will keep running this as there are benefits to be obtained from a meeting of a wide geographic scope that can’t be obtained by meetings in a single city. People from other countries are welcome to attend but they aren’t the focus of the meeting.

Until we get a better DNS name the address http://b.coker.com.au will redirect to the BBB instance used for online meetings (the meeting address isn’t yet setup so it redirects to the blog). The aim is that there will always be a short URL for the meeting so anyone who has one device lose contact can quickly type the URL into their backup device.

The first meeting will be on the 5th of Feb 2022 at 1PM Melbourne time +1100. When we get a proper domain I’ll publish a URL for an iCal file with entries for all meetings. I will also find some suitable way for meeting times to be localised (I’m sure there’s a WordPress plugin for that).

For the hands-on part of the meetings there will be virtual machine images you can download to run on your own system (tested with KVM, should work with other VM systems) and the possibility of logging in to a running VM. The demonstration VMs will have public IPv6 addresses and will also be available through different ports on a single IPv4 address, having IPv6 on your workstation will be convenient for you but you can survive without it.

Linux Australia has a list of LUGs in Australia, is there a similar list for NZ? One thing I’d like to see is a list of links for iCal files for all the meetings and also an iCal aggregator that for all iCal feeds of online meetings. I’ll host it myself if necessary, but it’s probably best to do it via Linux Australia (Linux Australasia?) if possible.

,

Jan SchmidtPulling on a thread

I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.

I found something boneheaded doing that today.

There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.

In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.

To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:

Pose recognition latency.
dt = interpose spacing, delay = frame to pose latency

Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.

Neither of those should be happening.

Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.

Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.

OpenHMD Rift CV1 tracking timing

This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!

That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.

This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.

There are 3 threads involved:

  • USB service thread, reading video frame packets and assembling pixels in framebuffers
  • Fast analysis thread, that checks tracking lock is still acquired
  • Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs

These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:

while driver_running:
    Pop a frame from the queue
    Process the frame
    Sleep for new frame notification

The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!

The fix is simple:

while driver_running:
   Pop a frame
   Process the frame
   if queue_is_empty():
     sleep for new frame notification

Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.

Pose latency and inter-pose spacing after fix

This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.

What does this mean in practice?

The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.

Before: Left Controller pose delays by sensor
After: Left Controller pose delays by sensor

Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.

You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.

The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):

Before: Left Controller poses
After: Left Controller poses

,

Colin CharlesThis thing is still on?

Yes, the blog is still on. January 2004 I moved to WordPress, and it is still here January 2022. I didn’t write much last year (neither here, not experimenting with the Hey blog). I didn’t post anything to Instagram last year either from what I can tell, just a lot of stories.

August 16 2021, I realised I was 1,000 days till May 12 2024, which is when I become 40. As of today, that leads 850 days. Did I squander the last 150 days? I’m back to writing almost daily in the Hobonichi Techo (I think last year and the year before were mostly washouts; I barely scribbled anything offline).

I got a new Apple Watch Series 7 yesterday. I can say I used the Series 4 well (79% battery life), purchased in the UK when I broke my Series 0 in Edinburgh airport.

TripIt stats for last year claimed 95 days on the road. This is of course, a massive joke, but I’m glad I did get to visit London, Lisbon, New York, San Francisco, Los Angeles without issue. I spent a lot of time in Kuantan, a bunch of Langkawi trips, and also, I stayed for many months at the Grand Hyatt Kuala Lumpur during the May lockdowns (I practically stayed there all lockdown).

With 850 days to go till I’m 40, I have plenty I would like to achieve. I think I’ll write a lot more here. And elsewhere. Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.

The post This thing is still on? first appeared on Colin Charles Agenda.

,

,

,

Gary PendergastWordPress and web3

Blockchain. Cryptocurrency. Ethereum. NFTs. DAOs. Smart Contracts. web3. It’s impossible to avoid the blockchain hype machine these days, but it’s often just as difficult to decipher what it all means.

On top of that, discourse around web3 is extremely polarising: everyone involved is very keen to a) pick a team, and b) get you to join their team. If you haven’t picked a team, you must be secretly with the other team.

Max Read made a compelling argument that the web3 debate is in fact two different debates:

But, OK, what is the root disagreement, exactly? The way I read it there are two broad “is web3 bullshit?” debates, not just one, centered around the following questions:

Can the blockchain do anything that other currently existing technology cannot do and/or do anything better or more efficiently than other currently existing technology?

Will the blockchain form the architecture of the internet of the future (i.e. “web3”), and/or will blockchain-native companies and organizations become important and powerful?

Max Read — Is web3 bullshit?

I’m inclined to agree with Max’s analysis here: there’s a technical question, and there’s a business/cultural question. It’s hard to separate the two when every day sees new headlines about millions of dollars being stolen or scammed; or thousands of people putting millions of dollars into highly optimistic ventures. There are extreme positives and extreme negatives happening all the time in the web3 world.

With that in mind, I want to take a step back from the day-to-day excitement of cryptocurrency and web3, and look at some of the driving philosophies espoused by the movement.

Philosophies of web3

There are a lot of differing viewpoints on web3, every individual has a slightly different take on it. There are three broad themes that stand out, however.

Decentralised

Blockchain-based technology is inherently distributed (with some esoteric caveats, but we can safely ignore them for now). In a world where the web centres around a handful of major services, where we’ve seen the harm that the likes of Facebook and YouTube can inflict on society, it’s not surprising that decentralisation would be a powerful theme drawing in anyone looking for an alternative.

Decentralisation isn’t new to the Internet, of course: it’s right there in the name. This giant set of “interconnected networks” has been decentralised from the very beginning. It’s not perfect, of course: oppressive governments can take control of the borders of their portion of the Internet, and we’ve come to rely on a handful of web services to handle the trickier parts of using the web. But fundamentally, that decentralised architecture is still there. I can still set up a web site hosted on my home computer, which anyone in the world could access.

I don’t do that, however, for the same reason that web3 isn’t immune from centralised services: Centralisation is convenient. Just as we have Facebook, or Google, or Amazon as giant centralised services on the current web, we can already see similar services appearing for web3. For payments, Coinbase has established itself as a hugely popular place exchange cryptocurrencies and traditional currencies. For NFTs, OpenSea is the service where you’ll find nearly every NFT collection. MetaMask keeps all of your crypto-based keys, tokens, and logins in a single “crypto wallet”.

Centralisation is convenient.

While web3 proponents give a lot of credence to the decentralised nature of cryptocurrency being a driver of popularity, I’m not so sure. At best, I’m inclined to think that decentralisation is table stakes these days: you can’t even get started as a global movement without a strong commitment to decentralisation.

But if decentralisation isn’t the key, what is?

Ownership

When we talk about ownership in web3, NFTs are clearly the flavour of the month, but recent research indicates that the entire NFT market is massively artificially inflated.

Rather than taking pot-shots at the NFT straw man, I think it’s more interesting to look at the idea of ownership in terms of attribution. The more powerful element of this philosophy isn’t about who owns something, it’s who created it. NFTs do something rather novel with attribution, allowing royalty payments to the original artist every time an NFT is resold. I love this aspect: royalties shouldn’t just be for movie stars, they should be for everyone.

Comparing that to the current web, take the 3 paragraphs written by Max Read that I quoted above. I was certainly under no technical obligation to show that it was a quote, to attribute it to him, or to link to the source. In fact, it would have been easier for me to just paste his words into this post, and pretend they were my own. I didn’t, of course, because I feel an ethical obligation to properly attribute the quote.

In a world where unethical actors will automatically copy/paste your content for SEO juice (indeed, I expect this blog post to show up on a bunch of these kinds of sites); where massive corporations will consume everything they can find about you, in order to advertise more effectively to you, it’s not at all surprising that people are looking for a technical solution for taking back control of their data, and for being properly attributed for their creations.

The interesting element of this philosophy isn’t about who owns something, it’s who created it.

That’s not to say that existing services discourage attribution: a core function of Twitter is retweets, a core function of Tumblr is reblogging. WordPress still supports trackbacks, even if many folks turn them off these days.

These are all blunt instruments, though, aimed at attributing an entire piece, rather than a more targeted approach. What I’d really like is a way to easily quote and attribute a small chunk of a post: 3 paragraphs (or blocks, if you want to see where I’m heading 😉), inserted into my post, linking back to where I got them from. If someone chooses to quote some of this post, I’d love to receive a pingback just for that quote, so it can be seen in the right context.

The functionality provide by Twitter and Tumblr is less of a technologically-based enforcement of attribution, and more of an example of paving the cow path: by and large, people want to properly attribute others, providing the tools to do so can easily become a fundamental part of how any software is used.

These tools only work so long as there’s an incentive to use them, however. web3 certainly provides the tools to attribute others, but much like SEO scammers copy/pasting blog posts, the economics of the NFT bubble is clearly a huge incentive to ignore those tools and ethical obligations, to the point that existing services have had to build additional features just to detect this abuse.

Monetisation

With every major blockchain also being a cryptocurrency, monetisation is at the heart of the entire web3 movement. Every level of the web3 tech stack involves a cryptocurrency-based protocol. This naturally permeates through the entire web3 ecosystem, where money becomes a major driving factor for every web3-based project.

And so, it’s impossible to look at web3 applications without also considering the financial aspect. When you have to pay just to participate, you have to ask whether every piece of content you create is “worth it”.

Again, let’s go back to the 3 paragraphs I quote above. In a theoretical web3 world, I’d publish this post on a blockchain in some form or another, and that act would also likely include noting that I’d quoted 3 blocks of text attributed to Max Read. I’d potentially pay some amount of money to Max, along with the fees that every blockchain charges in order to perform a transaction. While this process is potentially helpful to the original author at a first glance, I suspect the second and third order effects will be problematic. Having only just clicked the Publish button a few seconds earlier, I’m already some indeterminate amount of money out of pocket. Which brings me back to the question, is this post “worth it”? Will enough people tip/quote/remix/whatever me, to cover the cost of publishing? When every creative work must be viewed through a lens of financial impact, it fundamentally alters that creative process.

When you have to pay just to participate, you have to ask whether every piece of content you create is “worth it”.

Ultimately, we live in a capitalist society, and everyone deserves the opportunity to profit off their work. But by baking monetisation into the underlying infrastructure of web3, it becomes impossible to opt-out. You either have the money to participate without being concerned about the cost, or you’re going to need to weigh up every interaction by whether or not you can afford it.

Web3 Philosophies in WordPress

After breaking it all down, we can see that it’s not all black-and-white. There are some positive parts of web3, and some negative parts. Not that different to the web of today, in fact. 🙂 That’s not to say that either approach is the correct one: instead, we should be looking to learn from both, and produce something better.

Decentralised

I’ve long been a proponent of leveraging the massive install base of WordPress to provide distributed services to anyone. Years ago, I spoke about an idea called “Connected WordPress” that would do exactly that. While the idea didn’t gain a huge amount of traction at the time, the DNA of the Connected WordPress concept shares a lot of similar traits to the decentralised nature of web3.

I’m a big fan of decentralised technologies as a way for individuals to claw back power over their own data from the governments and massive corporations that would prefer to keep it all centralised, and I absolutely think we should be exploring ways to make the existing web more resistant to censorship.

At the same time, we have to acknowledge that there are certainly benefits to centralisation. As long as people have the freedom to choose how and where they participate, and centralised services are required to play nicely with self hosted sites, is there a practical difference?

I quite like how Solid allows you have it both ways, whilst maintaining control over your own data.

Ownership Attribution

Here’s the thing about attribution: you can’t enforce it with technology alone. Snapchat have indirectly demonstrated exactly this problem: in order to not lose a message, people would screenshot or record the message on their phone. In response, Snapchat implemented a feature to notify the other party when you screenshot a message from them. To avoid this, people will now use a second phone to take a photo or video of the message. While this example isn’t specifically about attribution, it demonstrates the problem that there’s no way to technologically restrict how someone interacts with content that you’ve published, once they’ve been granted access.

Instead of worrying about technical restrictions, then, we should be looking at how attribution can be made easier.

IndieWeb is a great example of how this can be done in a totally decentralised fashion.

Monetisation

I’m firmly of the opinion that monetisation of the things you create should be opt-in, rather than opt-out.

Modern society is currently obsessed with monetising everything, however. It comes in many different forms: hustle culture, side gigs, transforming hobbies into businesses, meme stocks, and cryptocurrencies: they’re all symptoms of this obsession.

I would argue that, rather than accepting as fait accompli that the next iteration of the web will be monetised to the core, we should be pushing back against this approach. Fundamentally, we should be looking to build for a post scarcity society, rather than trying to introduce scarcity where there previously was none.

While we work towards that future, we should certainly be easier for folks to monetise their work, but the current raft of cryptocurrencies just aren’t up to the task of operating as… currencies.

What Should You Do?

Well, that depends on what your priorities are. The conversations around web3 are taking up a lot of air right now, so it’s possible to get the impression web3 will be imminently replacing everything. It’s important to keep perspective on this, though. While there’s a lot of money in the web3 ecosystem right now, it’s dwarfed by the sheer size of the existing web.

If you’re excited about the hot new tech, and feeling inspired by the ideas espoused in web3 circles? Jump right in! I’m certain you’ll find something interesting to work on.

Always wanted to get into currency speculation, but didn’t want to deal with all those pesky “regulations” and “safeguards”? Boy howdy, are cryptocurrencies or NFTs the place for you. (Please don’t pretend that this paragraph is investment advice, it is nothing of the sort.)

Want to continue building stuff on the web, and you’re willing to learn new things when you need them, but are otherwise happy with your trajectory? Just keep on doing what you’re doing. Even if web3 does manage to live up to the hype, it’ll take a long time for it to be adopted by the mainstream. You’ll have years to adapt.

Final Thoughts

There are some big promises associated with web3, many of which sound very similar to the promises that were made around web 2.0, particularly around open APIs, and global interoperability. We saw what happened when those kinds of tools go wrong, and web3 doesn’t really solve those problems. It may exacerbate them in some ways, since it’s impossible to delete your data from a blockchain.

That said, (and I say this as a WordPress Core developer), just because a particular piece of software is not the optimal technical solution doesn’t mean it won’t become the most popular. Market forces can be a far stronger factor that technical superiority. There are many legitimate complaints about blockchain (including performance, bloat, fit for purpose, and security) that have been levelled against WordPress in the past, but WordPress certainly isn’t slowing down. I’m not even close to convinced that blockchain is the right technology to base the web on, but I’ve been doing this for too long to bet everything against it.

Markets can remain irrational a lot longer than you and I can remain solvent.

—A. Gary Shilling

As for me, well… 😄

I remain sceptical of web3 as it’s currently defined, but I think there’s room to change it, and to adopt the best bits into the existing web. Web 1.0 didn’t magically disappear when Web 2.0 rolled in, it adapted. Maybe we’ll look back in 10 years and say this was a time when the web fundamentally changed. Or, maybe we’ll refer to blockchain in the same breath as pets.com, and other examples from the dotcom boom of the 1990’s.

The Net interprets censorship as damage and routes around it.

—John Gilmore

This quote was originally referring to Usenet, but it’s stayed highly relevant in the decades since. I think it applies here, too: if the artificial scarcity built into web3 behaves too much like censorship, preventing people from sharing what they want to share, the internet (or, more accurately, the billions of people who interact with the internet) will just… go around it. It won’t all be smooth sailing, but we’ll continue to experiment, evolve, and adapt as it changes.

Personally, I think now is a great time for us to be embracing the values and ideals of projects like Solid, and IndieWeb. Before web3 referred to blockchains, it was more commonly used in reference to the Semantic Web, which is far more in line with WordPress’ ideals, whilst also matching many of the values prioritised by the new web3. As a major driver of the Open Web, WordPress can help people own their content in a sustainable way, engage with others on their own terms, and build communities that don’t depend on massive corporations or hand-wavy magical tech solutions.

Don’t get too caught up in the drama of whatever is the flavour of the month. I’m optimistic about the long term resilience of the internet, and I think you should be, too. 🥳

,

Jan Schmidt2.5 years of Oculus Rift

Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!

In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.

Outlier Detection

Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.

It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.

Plot of headset tracking – orientation and position

The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.

Left Touch controller tracking – orientation and position

This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).

Improved Pose Priors

Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.

This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.

LED visibility model

To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.

Better Smoothing

I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.

Improved Display Distortion Correction

I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.

Persistent Config Cache

Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.

Persistent Room Configuration

To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.

Video Capture + Processing

The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.

Session Simulator

I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.

To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):

export OHMD_TRACE_DIR=/home/user/openhmd-traces/
export OHMD_FULL_RECORDING=1

Up Next

The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.

I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.

One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!

As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.

Thank you to the kind people signed up as Github Sponsors for this project!

,

Matt PalmerDiscovering AWS IAM accounts

Let’s say you’re someone who happens to discover an AWS account number, and would like to take a stab at guessing what IAM users might be valid in that account. Tricky problem, right? Not with this One Weird Trick!

In your own AWS account, create a KMS key and try to reference an ARN representing an IAM user in the other account as the principal. If the policy is accepted by PutKeyPolicy, then that IAM account exists, and if the error says “Policy contains a statement with one or more invalid principals” then the user doesn’t exist.

As an example, say you want to guess at IAM users in AWS account 111111111111. Then make sure this statement is in your key policy:

{
  "Sid": "Test existence of user",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::111111111111:user/bob"
  },
  "Action": "kms:DescribeKey",
  "Resource": "*"
}

If that policy is accepted, then the account has an IAM user named bob. Otherwise, the user doesn’t exist. Scripting this is left as an exercise for the reader.

Sadly, wildcards aren’t accepted in the username portion of the ARN, otherwise you could do some funky searching with ...:user/a*, ...:user/b*, etc. You can’t have everything; where would you put it all?

I did mention this to AWS as an account enumeration risk. They’re of the opinion that it’s a good thing you can know what users exist in random other AWS accounts. I guess that means this is a technique you can put in your toolbox safe in the knowledge it’ll work forever.

Given this is intended behaviour, I assume you don’t need to use a key policy for this, but that’s where I stumbled over it. Also, you can probably use it to enumerate roles and anything else that can be a principal, but since I don’t see as much use for that, I didn’t bother exploring it.

There you are, then. If you ever need to guess at IAM users in another AWS account, now you can!

,

Glen TurnerThe tyranny of product names

For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").

It's time we pushed back, and wrote for our readers, not for corporations.

It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".

So from here on in I'll be using the following:

  • Face Book. Formerly, "Facebook®".
  • Junos. Formerly JUNOS®.
  • ram. Formerly RAM.
  • Pan OS. Formerly PAN-OS®.
  • Unix. Formerly UNIX®.

I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.



comment count unavailable comments

,

Dave HallYour Terraform Module Needs an Opinion

Learn why your Terraform modules should be opinionated.

,

Chris NeugebauerTalk Notes: On The Use and Misuse of Decorators

I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:

Simple @property implementation

This shows a demo of @property-style getters. Setters are left as an exercise :)


def demo_property(f):
    f.is_a_property = True
    return f


class HasProperties:

    def __getattribute__(self, name):
        ret = super().__getattribute__(name)
        if hasattr(ret, "is_a_property"):
            return ret()
        else:
            return ret

class Demo(HasProperties):

    @demo_property
    def is_a_property(self):
        return "I'm a property"

    def is_a_function(self):
        return "I'm a function"


a = Demo()
print(a.is_a_function())
print(a.is_a_property)

@run (The Scoped Block)

@run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.

def run(f):
    return f()

@run
def hello_world():
    return "Hello, World!"

print(hello_world)

@apply (Multi-line stream transformers)

def apply(transformer, iterable_):

    def _applicator(f):

        return(transformer(f, iterable_))

    return _applicator

@apply(map, range(100)
def fizzbuzzed(i):
    if i % 3 == 0 and i % 5 == 0:
        return "fizzbuzz"
    if i % 3 == 0:
        return "fizz"
    elif i % 5 == 0:
        return "buzz"
    else:
        return str(i)

Builders


def html(f):
    builder = HtmlNodeBuilder("html")
    f(builder)
    return builder.build()


class HtmlNodeBuilder:
    def __init__(self, tag_name):
       self.tag_name = tag_name
       self.nodes = []

   def node(self, f):
        builder = HtmlNodeBuilder(f.__name__)
        f(builder)
        self.nodes.append(builder.build())

    def text(self, text):
        self.nodes.append(text)

    def build(self):
      nodes = "\n".join(self.nodes)
       return f"<{self.tag_name}>\n{nodes}\n</{self.tag_name}>"


@html
def document(b):
   @b.node
   def head(b):
       @b.node
       def title(b):
           b.text("Hello, World!")

   @b.node
   def body(b):
       for i in range(10, 0, -1):
           @b.node
           def p(b):
               b.text(f"{i}")

Code Registries

This is an incomplete implementation of a code registry for handling simple text processing tasks:

```python

def register(self, input, output):

def _register_code(f):
    self.registry[(input, output)] = f
    return f

return _register_code

in_type = (iterable[str], (WILDCARD, ) out_type = (Counter, (WILDCARD, frequency))

@registry.register(in_type, out_type) def count_strings(strings):

return Counter(strings)

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, lowercase)) ) def words_to_lowercase(words): …

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, no_punctuation)) ) def words_without_punctuation(words): …

def find_steps( self, input_type, input_attrs, output_type, output_attrs ):

hand_wave()

def give_me(self, input, output_type, output_attrs):

steps = self.find_steps(
    type(input), (), output_type, output_attrs
)

temp = input
for step in steps:
    temp = step(temp)

return temp

,

Jan SchmidtOpenHMD update

A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.

The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:

  • Adding velocity and acceleration API to OpenHMD.
  • Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.

Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.

The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.

Ongoing work – record and replay

My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.

Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.

Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.

I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.

To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.

Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.

Ian BrownNGINX Ingress Controller in GKE

GKE in Production - Part 2 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. In this episode, we are covering how to setup a nginx ingress controller to handle incoming requests. Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.

,

Robert CollinsA moment of history

I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.

We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.

The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.

I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.

And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.

Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.


From: Mark Shuttleworth <xxx@xxx>
To: Robert Collins <xxx@xxx>
Date: Thu, 15 Jan 2004, 04:30

Tom Lord gave me your email address, I believe he’s
already sent you the email that I sent him so I’m sure
you have some background.

In short, I am going to fund some open source
development for a year. This is part of a new project
that I will be getting off the ground in the coming
weeks. I don’t know where it will lead, it’s flying in
the face of a stiff breeze but I think at the end of
the day it will at least fund a few very good open
source developers for a full year to work on the
projects they like most.

One of the pieces of the puzzle is high end source
code management. I’ll be looking to build an
infrastructure that will manage source code for
between 100 and 8000 open source projects (yes,
there’s a big difference between the two, I don’t know
at which end of the spectrum we will be at the end of
the year but our infrastructure will have to at least
be capable of scaling to the latter within two years)
with upwards of 2000 developers, drawing code from a
variety of sources, playing with it and spitting it
out regularly in nice packages.

Arch and Subversion seem to be the two leading
contenders for “next generation open source sccm”. I’d
be interested in your thoughts on the two of them, and
how they stack up. I’m looking to hire one person who
will lead that part of the effort. They’ll work alone
from home, and be responsible for two things. First,
extending the tool (arch or svn) in ways that help the
project. Such extensions will be released under an
open source licence, and hopefully embraced by the
tools maintainers and included in the mainline code
for the tool. And second, they will be responsible for
our large-scale implementation of SCCM, using that
tool, and building the management scripts and other
infrastructure to support such a large, and hopefully
highly automated, set of repositories.

Would you be interested in this position? What
attributes and experience do you think would make you
a great person to have on the team? What would your
salary expectation be, as a monthly figure, for a one
year contract full time?

I’m currently on your continent, well, just off it. On
Lizard Island, up North. Am headed today for Brisbane,
then on the 17th to Launceston via Melbourne. If you
happen to be on any of those stops, would you be
interested in meeting up to discuss it further?

If you’re curious you can find out a bit more about me
at www.markshuttleworth.com. This project is much
lower key than some of what you’ll find there. It’s a
very long shot indeed. But if at worst all that
happens is a bunch of open source work gets funded at
my expense I’ll feel it was money well spent.

Cheers,
Mark

=====

“Good judgement comes from experience, and often experience
comes from bad judgement” – Rita Mae Brown


,

sthbrx - a POWER technical blogFuzzing grub, part 2: going faster

Recently a set of 8 vulnerabilities were disclosed for the grub bootloader. I found 2 of them (CVE-2021-20225 and CVE-2021-20233), and contributed a number of other fixes for crashing bugs which we don't believe are exploitable. I found them by applying fuzz testing to grub. Here's how.

This is a multi-part series: I think it will end up being 4 posts. I'm hoping to cover:

We've been looking at fuzzing grub-emu, which is basically most parts of grub built into a standard userspace program. This includes all the script parsing logic, fonts, graphics, partition tables, filesystems and so on - just not platform specific driver code or the ability to actually load and boot a kernel.

Previously, we talked about some issues building grub with AFL++'s instrumentation:

:::text
./configure --with-platform=emu --disable-grub-emu-sdl CC=$AFL_PATH/afl-cc
...
checking whether target compiler is working... no
configure: error: cannot compile for the target

It also doesn't work with afl-gcc.

We tried to trick configure:

:::shell
./configure --with-platform=emu --disable-grub-emu-sdl CC=clang CXX=clang++
make CC="$AFL_PATH/afl-cc" 

Sadly, things still break:

:::text
/usr/bin/ld: disk.module:(.bss+0x20): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: regexp.module:(.bss+0x70): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: blocklist.module:(.bss+0x28): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here

The problem is the module linkage that I talked about in part 1. There is a link stage of sorts for the kernel (kernel.exec) and each module (e.g. disk.module), so some AFL support code gets linked into each of those. Then there's another link stage for grub-emu itself, which also tries to bring in the same support code. The linker doesn't like the symbols being in multiple places, which is fair enough.

There are (at least) 3 ways you could solve this. I'm going to call them the hard way, and the ugly way and the easy way.

The hard way: messing with makefiles

We've been looking at fuzzing grub-emu. Building grub-emu links kernel.exec and almost every .module file that grub produces into the final binary. Maybe we could avoid our duplicate symbol problems entirely by changing how we build things?

I didn't do this in my early work because, to be honest, I don't like working with build systems and I'm not especially good at it. grub's build system is based on autotools but is even more quirky than usual: rather than just having a Makefile.am, we have Makefile.core.def which is used along with other things to generate Makefile.am. It's a pretty cool system for making modules, but it's not my idea of fun to work with.

But, for the sake of completeness, I tried again.

It gets unpleasant quickly. The generated grub-core/Makefile.core.am adds each module to platform_PROGRAMS, and then each is built with LDFLAGS_MODULE = $(LDFLAGS_PLATFORM) -nostdlib $(TARGET_LDFLAGS_OLDMAGIC) -Wl,-r,-d.

Basically, in the makefile this ends up being (e.g.):

:::make
tar.module$(EXEEXT): $(tar_module_OBJECTS) $(tar_module_DEPENDENCIES) $(EXTRA_tar_module_DEPENDENCIES) 
    @rm -f tar.module$(EXEEXT)
    $(AM_V_CCLD)$(tar_module_LINK) $(tar_module_OBJECTS) $(tar_module_LDADD) $(LIBS)

Ideally I don't want them to be linked at all; there's no benefit if they're just going to be linked again.

You can't just collect the sources and build them into grub-emu - they all have to built with different CFLAGS! So instead I spent some hours messing around with the build system. Given some changes to the python script that converts the Makefile.*.def files into Makefile.am files, plus some other bits and pieces, we can build grub-emu by linking the object files rather than the more-processed modules.

The build dies immediately after linking grub-emu in other components, and it requires a bit of manual intervention to get the right things built in the right order, but with all of those caveats, it's enough. It works, and you can turn on things like ASAN, but getting there was hard, unrewarding and unpleasant. Let's consider alternative ways to solve this problem.

The ugly way: patching AFL

What I did when finding the bugs was to observe that we only wanted AFL to link in its extra instrumentation at certain points of the build process. So I patched AFL to add an environment variable AFL_DEFER_LIB - which prevented AFL adding its own instrumentation library when being called as a linker. I combined this with the older CFG instrumentation, as the PCGUARD instrumentation brought in a bunch of symbols from LLVM which I didn't want to also figure out how to guard.

I then wrapped this in a horrifying script that basically built bits and pieces of grub with the environment variable on or off, in order to at least get the userspace tools and grub-emu built. Basically it set AFL_DEFER_LIB when building all the modules and turned it off when building the userspace tools and grub-emu.

This worked and it's what I used to find most of my bugs. But I'd probably not recommend it, and I'm not sharing the source: it's extremely fragile and brittle, the hard way is more generally applicable, and the easy way is nicer.

The easy way: adjusting linker flags

After posting part 1 of this series, I had a fascinating twitter DM conversation with @hackerschoice, who pointed me to some new work that had been done in AFL++ between when I started and when I published part 1.

AFL++ now has the ability to dynamically detect some duplicate symbols, allowing it to support plugins and modules better. This isn't directly applicable because we link all the modules in at build time, but in the conversation I was pointed to a linker flag which instructs the linker to ignore the symbol duplication rather than error out. This provides a significantly simpler way to instrument grub-emu, avoiding all the issues I'd previously been fighting so hard to address.

So, with a modern AFL++, and the patch from part 1, you can sort out this entire process like this:

:::shell
./bootstrap
./configure --with-platform=emu CC=clang CXX=clang++ --disable-grub-emu-sdl
make CC=/path/to/afl-clang-fast LDFLAGS="-Wl,--allow-multiple-definition"

Eventually it will error out, but ./grub-core/grub-emu should be successfully built first.

(Why not just build grub-emu directly? It gets built by grub-core/Makefile, but depends on a bunch of things made by the top-level makefile and doesn't express its dependencies well. So you can try to build all the things that you need separately and then cd grub-core; make ...flags... grub-emu if you want - but it's way more complicated to do it that way!)

Going extra fast: __AFL_INIT

Now that we can compile with instrumentation, we can use __AFL_INIT. I'll leave the precise details of how this works to the AFL docs, but in short it allows us to do a bunch of early setup only once, and just fork the process after the setup is done.

There's a patch that inserts a call to __AFL_INIT in the grub-emu start path in my GitHub repo.

All up, this can lead to a 2x-3x speedup over the figures I saw in part 1. In part 1 we saw around 244 executions per second at this point - now we're over 500:

afl-fuzz fuzzing grub, showing fuzzing happening

Finding more bugs with sanitisers

A 'sanitizer' refers to a set of checks inserted by a compiler at build time to detect various runtime issues that might not cause a crash or otherwise be detected. A particularly common and useful sanitizer is ASAN, the AddressSanitizer, which detects out-of-bounds memory accesses, use-after-frees and other assorted memory bugs. Other sanitisers can check for undefined behaviour, uninitialised memory reads or even breaches of control flow integrity.

ASAN is particularly popular for fuzzing. In theory, compiling with AFL++ and LLVM makes it really easy to compile with ASAN. Setting AFL_USE_ASAN=1 should be sufficient.

However, in practice, it's quite fragile for grub. I believe I had it all working, and then I upgraded my distro, LLVM and AFL++ versions, and everything stopped working. (It's possible that I hadn't sufficiently cleaned my source tree and I was still building based on the hard way? Who knows.)

I spent quite a while fighting "truncated relocations". ASAN instrumentation was bloating the binaries, and the size of all the *.module files was over 512MB, which I suspect was causing the issues. (Without ASAN, it only comes to 35MB.)

I tried afl-clang-lto: I installed lld, rebuilt AFL++, and managed to segfault the linker while building grub. So I wrote that off. Changing the instrumentation type to classic didn't help either.

Some googling suggested GCC's -mmodel, which in Clang seems to be -mcmodel, but CFLAGS="-mcmodel=large" didn't get me any further either: it's already added in a few different links.

My default llvm is llvm-12, so I tried building with llvm-9 and llvm-11 in case that helped. Both built a binary, but it would fail to start:

:::text
==375638==AddressSanitizer CHECK failed: /build/llvm-toolchain-9-8fovFY/llvm-toolchain-9-9.0.1/compiler-rt/lib/sanitizer_common/sanitizer_common_libcdep.cc:23 "((SoftRssLimitExceededCallback)) == ((nullptr))" (0x423660, 0x0)

The same happens if I build with llvm-12 and afl-clang, the old-style instrumentation.

I spun up a Ubuntu 20.04 VM and build there with LLVM 10 and the latest stable AFL++. That didn't work either.

I had much better luck using GCC's and GCC's ASAN implementation, either with the old-school afl-gcc or the newer GCC plugin-based afl-gcc-fast. (I have some hypotheses around shared library vs static library ASAN, but having spent way more work time on this than was reasonable, I was disinclined to debug it further.) Here's what worked for me:

:::shell
./configure --with-platform=emu --disable-grub-emu-sdl
# the ASAN option is required because one of the tools leaks memory and
# that breaks the generation of documentation.
# -Wno-nested-extern makes __AFL_INIT work on gcc
ASAN_OPTIONS=detect_leaks=0 AFL_USE_ASAN=1 make CC=/path/to/afl-gcc-fast LDFLAGS="-Wl,--allow-multiple-definition" CFLAGS="-Wno-nested-extern"

GCC doesn't support as many sanitisers as LLVM, but happily it does support ASAN. AFL++'s GCC plugin mode should get us most of the speed we would get from LLVM, and indeed the speed - even with ASAN - is quite acceptable.

If you persist, you should be able to find some more bugs: for example there's a very boring global array out-of-bounds read when parsing config files.

That's all for part 2. In part 3 we'll look at fuzzing filesystems and more. Hopefully there will be a quicker turnaround between part 2 and part 3 than there was between part 1 and part 2!

,

Arjen LentzClassic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The post Classic McEleice and the NIST search for post-quantum crypto first appeared on Lentz family blog.

,

Ian BrownKubenetes Basic Setup

GKE in Production - Part 1 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site. What we will build In this first tutorial, we will be building a standard GKE cluster on Google Cloud Platform and deploying the hello world container to confirm everything is working.

,

Dave HallA Rube Goldberg Machine for Container Workflows

Learn how can you securely copy container images from GHCR to ECR.

,

Chris NeugebauerAdding a PurpleAir monitor to Home Assistant

Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.

Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.

I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.

You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.

How to use this

First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.

Adding the configuration

In HomeAssistant, add the following line to your configuration.yaml:

sensor: !include purpleair.yaml

and then add the following contents to purpleair.yaml


 - platform: rest
   name: 'PurpleAir'

   # Substitute in the URL of the sensor you care about.  To find the URL, go
   # to purpleair.com/map, find your sensor, click on it, click on "Get This
   # Widget" then click on "JSON".
   resource: https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}

   # Only query once a minute to avoid rate limits:
   scan_interval: 60

   # Set this sensor to be the AQI value.
   #
   # Code translated from JavaScript found at:
   # https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#
   value_template: >
     {{ value_json["results"][0]["Label"] }}
   unit_of_measurement: ""
   # The value of the sensor can't be longer than 255 characters, but the
   # attributes can.  Store away all the data for use by the templates below.
   json_attributes:
     - results

 - platform: template
   sensors:
     purpleair_aqi:
       unique_id: 'purpleair_SENSORID_aqi_pm25'
       friendly_name: 'PurpleAir PM2.5 AQI'
       value_template: >
         {% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}
           {{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}
         {%- endmacro %}
         {% if (states('sensor.purpleair_pm25')|float) > 1000 %}
           invalid
         {% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}
         {% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}
         {% else %}
           invalid
         {% endif %}
       unit_of_measurement: "bit"
     purpleair_description:
       unique_id: 'purpleair_SENSORID_description'
       friendly_name: 'PurpleAir AQI Description'
       value_template: >
         {% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}
           Very Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}
           Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}
           Unhealthy for Sensitive Groups
         {% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}
           Moderate
         {% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}
           Good
         {% else %}
           undefined
         {% endif %}
       entity_id: sensor.purpleair
     purpleair_pm25:
       unique_id: 'purpleair_SENSORID_pm25'
       friendly_name: 'PurpleAir PM 2.5'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['PM2_5Value'] }}"
       unit_of_measurement: "μg/m3"
       entity_id: sensor.purpleair
     purpleair_temp:
       unique_id: 'purpleair_SENSORID_temperature'
       friendly_name: 'PurpleAir Temperature'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['temp_f'] }}"
       unit_of_measurement: "°F"
       entity_id: sensor.purpleair
     purpleair_humidity:
       unique_id: 'purpleair_SENSORID_humidity'
       friendly_name: 'PurpleAir Humidity'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['humidity'] }}"
       unit_of_measurement: "%"
       entity_id: sensor.purpleair
     purpleair_pressure:
       unique_id: 'purpleair_SENSORID_pressure'
       friendly_name: 'PurpleAir Pressure'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['pressure'] }}"
       unit_of_measurement: "hPa"
       entity_id: sensor.purpleair

Quirks

I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�

,

Stewart SmithAn Unearthly Child

So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.

A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.

I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.

Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.

“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).

,

Jan SchmidtRift CV1 – Getting close now…

It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.

Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.

Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:

Beat Saber – Skunkynator playing Expert+, Mar 16 2021

Strap in. Here’s what I’ve worked on in the last 6 weeks:

Pose Matching improvements

Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.

I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.

Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.

In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.

I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.

LED labelling

When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.

The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.

IMU Gyro scale

One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.

The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.

From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.

The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.

Limited interpolation

If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.

Exponential filtering

I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.

Non-blocking radio

Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.

Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.

As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.

Haptic feedback

The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.

In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.

With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.

The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip

Problem areas

Unexpected tracking losses

I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:

To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.

Sensor fusion efficiency

The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.

At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.

Lens Distortion

To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.

Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.

Camera / USB failures

The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.

Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.

Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.

Occluded cases

Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.

If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.

Upstreaming

April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.

Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.

One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.

Others

I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.

Other Headsets

At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.

Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.

Sponsorship

This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.

,

Stewart Smithlibeatmydata v129

Every so often, I release a new libeatmydata. This has not happened for a long time. This is just some bug fixes, most of which have been in the Debian package for some time, I’ve just been lazy and not sat down and merged them.

git clone https://github.com/stewartsmith/libeatmydata.git

Download the source tarball from here: libeatmydata-129.tar.gz and GPG signature: libeatmydata-129.tar.gz.asc from my GPG key.

Or, feel free to grab some Fedora RPMs:

Releases published also in the usual places:

,

BlueHackersWorld bipolar day 2021

Today, 30 March, is World Bipolar Day.

Vincent van Gogh - Worn Out

Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).

The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.

Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.

Have you heard of Bipolar II?

Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.

Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)

Call to action

If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.

Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.

I will be writing more about the topic in the coming time.

The post World bipolar day 2021 first appeared on BlueHackers.org.

,

Stewart SmithThe Apple Power Macintosh 7200/120 PC Compatible (Part 1)

So, I learned something recently: if you pick up your iPhone with eBay open on an auction bid screen in just the right way, you may accidentally click the bid button and end up buying an old computer. Totally not the worst thing ever, and certainly a creative way to make a decision.

So, not too long later, a box arrives!

In the 1990s, Apple created some pretty “interesting” computers and product line. One thing you could get is a DOS Compatibility (or PC Compatibility) card. This was a card that went into one of the expansion slots on a Mac and had something really curious on it: most of the guts of a PC.

Others have written on these cards too: https://www.engadget.com/2009-12-10-before-there-was-boot-camp-there-were-dos-compatibility-cards.html and http://www.edibleapple.com/2009/12/09/blast-from-the-past-a-look-back-at-apples-dos-compatibility-cards/. There’s also the Service Manual https://tim.id.au/laptops/apple/misc/pc_compatibility_card.pdf with some interesting details.

The machine I’d bought was an Apple Power Macintosh 7200/120 with the PC Compatible card added afterwards (so it doesn’t have the PC Compatible label on the front like some models ended up getting).

The Apple Power Macintosh 7200/120

Wikipedia has a good article on the line, noting that it was first released in August 1995, and fitting for the era, was sold as about 14 million other model numbers (okay not quite that bad, it was only a total of four model numbers for essentially the same machine). This specific model, the 7200/120 was introduced on April 22nd, 1996, and the original web page describing it from Apple is on the wayback machine.

For older Macs, Low End Mac is a good resource, and there’s a page on the 7200, and amazingly Apple still has the tech specs on their web site!

The 7200 series replaced the 7100, which was one of the original PowerPC based Macs. The big changes are using the industry standard PCI bus for its three expansion slots rather than NuBus. Rather surprisingly, NuBus was not Apple specific, but you could not call it widely adopted by successful manufacturers. Apple first used NuBus in the 1987 Macintosh II.

The PCI bus was standardized in 1992, and it’s almost certain that a successor to it is in the computer you’re using to read this. It really quite caught on as an industry standard.

The processor of the machine is a PowerPC 601. The PowerPC was an effort of IBM, Apple, and Motorola (the AIM Alliance) to create a class of processors for personal computers based on IBM’s POWER Architecture. The PowerPC 601 was the first of these processors, initially used by Apple in its Power Macintosh range. The machine I have has one running at a whopping 120Mhz. There continued to be PowerPC chips for a number of years, and IBM continued making POWER processors even after that. However, you are almost certainly not using a PowerPC derived processor in the computer you’re using to read this.

The PC Compatibility card has on it a full on legit Pentium 100 processor, and hardware for doing VGA graphics, a Sound Blaster 16 and the other things you’d usually expect of a PC from 1996. Since it’s on a PCI card though, it’s a bit different than a PC of the era. It doesn’t have any expansion slots of its own, and in fact uses up one of the three PCI slots in the Mac. It also doesn’t have its own floppy drive, or hard drive. There’s software on the Mac that will let the PC card use the Mac’s floppy drive, and part of the Mac’s hard drive for the PC!

The Pentium 100 was the first mass produced superscalar processor. You are quite likely to be using a computer with a processor related to the Pentium to read this, unless you’re using a phone or tablet, or one of the very latest Macs; in which case you’re using an ARM based processor. You likely have more ARM processors in your life than you have socks.

Basically, this computer is a bit of a hodge-podge of historical technology, some of which ended up being successful, and other things less so.

Let’s have a look inside!

So, one of the PCI slots has a Vertex Twin Turbo 128M8A video card in it. There is not much about this card on the internet. There’s a photo of one on Wikimedia Commons though. I’ll have to investigate more.

Does it work though? Yes! Here it is on my desk:

The powered on Power Mac 7200/120

Even with Microsoft Internet Explorer 4.0 that came with MacOS 8.6, you can find some places on the internet you can fetch files from, at a not too bad speed even!

More fun times with this machine to come!

,

sthbrx - a POWER technical blogFuzzing grub: part 1

Recently a set of 8 vulnerabilities were disclosed for the grub bootloader. I found 2 of them (CVE-2021-20225 and CVE-2021-20233), and contributed a number of other fixes for crashing bugs which we don't believe are exploitable. I found them by applying fuzz testing to grub. Here's how.

This is a multi-part series: I think it will end up being 4 posts. I'm hoping to cover:

  • Part 1 (this post): getting started with fuzzing grub
  • Part 2: going faster by doing lots more work
  • Part 3: fuzzing filesystems and more
  • Part 4: potential next steps and avenues for further work

Fuzz testing

Let's begin with part one: getting started with fuzzing grub.

One of my all-time favourite techniques for testing programs, especially programs that handle untrusted input, and especially-especially programs written in C that parse untrusted input, is fuzz testing. Fuzz testing (or fuzzing) is the process of repeatedly throwing randomised data at your program under test and seeing what it does.

(For the secure boot threat model, untrusted input is anything not validated by a cryptographic signature - so config files are untrusted for our purposes, but grub modules can only be loaded if they are signed, so they are trusted.)

Fuzzing has a long history and has recently received a new lease on life with coverage-guided fuzzing tools like AFL and more recently AFL++.

Building grub for AFL++

AFL++ is extremely easy to use ... if your program:

  1. is built as a single binary with a regular tool-chain
  2. runs as a regular user-space program on Linux
  3. reads a small input files from disk and then exits
  4. doesn't do anything fancy with threads or signals

Beyond that, it gets a bit more complex.

On the face of it, grub fails 3 of these 4 criteria:

  • grub is a highly modular program: it loads almost all of its functionality as modules which are linked as separate ELF relocatable files. (Not runnable programs, but not shared libraries either.)

  • grub usually runs as a bootloader, not as a regular app.

  • grub reads all sorts of things, ranging in size from small files to full disks. After loading most things, it returns to a command prompt rather than exiting.

Fortunately, these problems are not insurmountable.

We'll start with the 'running as a bootloader' problem. Here, grub helps us out a bit, because it provides an 'emulator' target, which runs most of grub functionality as a userspace program. It doesn't support actually booting anything (unsurprisingly) but it does support most other modules, including things like the config file parser.

We can configure grub to build the emulator. We disable the graphical frontend for now.

:::shell
./bootstrap
./configure --with-platform=emu --disable-grub-emu-sdl

At this point in building a fuzzing target, we'd normally try to configure with afl-cc to get the instrumentation that makes AFL(++) so powerful. However, the grub configure script is not a fan:

:::text
./configure --with-platform=emu --disable-grub-emu-sdl CC=$AFL_PATH/afl-cc
...
checking whether target compiler is working... no
configure: error: cannot compile for the target

It also doesn't work with afl-gcc.

Hmm, ok, so what if we just... lie a bit?

:::shell
./configure --with-platform=emu --disable-grub-emu-sdl
make CC="$AFL_PATH/afl-gcc" 

(Normally I'd use CC=clang and afl-cc, but clang support is slightly broken upstream at the moment.)

After a small fix for gcc-10 compatibility, we get the userspace tools (potentially handy!) but a bunch of link errors for grub-emu:

:::text
/usr/bin/ld: disk.module:(.bss+0x20): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: regexp.module:(.bss+0x70): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: blocklist.module:(.bss+0x28): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here

The problem is the module linkage that I talked about earlier: because there is a link stage of sorts for each module, some AFL support code gets linked in to both the grub kernel (kernel.exec) and each module (here disk.module, regexp.module, ...). The linker doesn't like it being in both, which is fair enough.

To get started, let's instead take advantage of the smarts of AFL++ using Qemu mode instead. This builds a specially instrumented qemu user-mode emulator that's capable of doing coverage-guided fuzzing on uninstrumented binaries at the cost of a significant performance penalty.

:::shell
make clean
make

Now we have a grub-emu binary. If you run it directly, you'll pick up your system boot configuration, but the -d option can point it to a directory of your choosing. Let's set up one for fuzzing:

:::shell
mkdir stage
echo "echo Hello sthbrx readers" > stage/grub.cfg
cd stage
../grub-core/grub-emu -d .

You probably won't see the message because the screen gets blanked at the end of running the config file, but if you pipe it through less or something you'll see it.

Running the fuzzer

So, that seems to work - let's create a test input and try fuzzing:

:::shell
cd ..
mkdir in
echo "echo hi" > in/echo-hi

cd stage
# -Q qemu mode
# -M main fuzzer
# -d don't do deterministic steps (too slow for a text format)
# -f create file grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -- ../grub-core/grub-emu -d .

Sadly:

:::text
[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.

    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.

[-] PROGRAM ABORT : Test case 'id:000000,time:0,orig:echo-hi' results in a timeout
         Location : perform_dry_run(), src/afl-fuzz-init.c:866

What we're seeing here (and indeed what you can observe if you run grub-emu directly) is that grub-emu isn't exiting when it's done. It's waiting for more input, and will keep waiting for input until it's killed by afl-fuzz.

We need to patch grub to sort that out. It's on my GitHub.

Apply that, rebuild with FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION, and voila:

:::shell
cd ..
make CFLAGS="-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION"
cd stage
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -f grub.cfg -- ../grub-core/grub-emu -d .

And fuzzing is happening!

afl-fuzz fuzzing grub, showing fuzzing happening

This is enough to find some of the (now-fixed) bugs in the grub config file parsing!

Fuzzing beyond the config file

You can also extend this to fuzzing other things that don't require the graphical UI, such as grub's transparent decompression support:

:::shell
cd ..
rm -rf in out stage
mkdir in stage
echo hi > in/hi
gzip in/hi
cd stage
echo "cat thefile" > grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -f thefile -- ../grub-core/grub-emu -d .

You should be able to find a hang pretty quickly with this, an as-yet-unfixed bug where grub will print output forever from a corrupt file: (your mileage may vary, as will the paths.)

:::shell
cp ../out/main/hangs/id:000000,src:000000,time:43383,op:havoc,rep:16 thefile
../grub-core/grub-emu -d . | less # observe this going on forever

zcat, on the other hand, reports it as simply corrupt:

:::text
$ zcat thefile

gzip: thefile: invalid compressed data--format violated

(Feel free to fix that and send a patch to the list!)

That wraps up part 1. Eventually I'll be back with part 2, where I explain the hoops to jump through to go faster with the afl-cc instrumentation.

,

Dave HallParameter Store vs Secrets Manager

Which AWS managed service is best for storing and managing your secrets?

,

Dave HallA Lost Parcel Results in a New Website

When Australia Post lost a parcel, we found a lot of problems with one of their websites.

,

Jan SchmidtRift CV1 – Testing SteamVR

Update:

This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.


I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:

  • Make sure Steam + SteamVR are already installed.
  • Clone the SteamVR-OpenHMD repository:
git clone --recursive https://github.com/ChristophHaag/SteamVR-OpenHMD.git
  • Switch the internal copy of OpenHMD to the right branch:
cd subprojects/openhmd
git remote add thaytan-github https://github.com/thaytan/OpenHMD.git
git fetch thaytan-github
git checkout -b rift-kalman-filter thaytan-github/rift-kalman-filter
cd ../../
  • Use meson to build and register the SteamVR-OpenHMD binaries. You may need to install meson first (see below):
meson -Dbuildtype=release build
ninja -C build
./install_files_to_build.sh
./register.sh
  • It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
  • Make sure your USB devices are accessible to your user account by configuring udev. See the OpenHMD guide here: https://github.com/OpenHMD/OpenHMD/wiki/Udev-rules-list
  • Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
  • It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
./build/subprojects/openhmd/openhmd_simple_example
  • Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental 🙂
  • Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.

Meson

I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.

If you need to install meson, there are instructions here – https://mesonbuild.com/Getting-meson.html summarising the various methods.

I use a copy in my home directory, but you need to make sure ~/.local/bin is in your PATH

pip3 install --user meson

,

Jan SchmidtRift CV1 – Pose rejection

I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.

In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):

  1. Do a brute-force search to match LEDs to 3D models, then (if matched)
    1. Assign labels to each LED blob in the video frame saying what LED they are.
    2. Send an update to the fusion filter about the position / orientation of the device
  2. Then, as each video frame arrives:
    1. Use motion flow between video frames to track the movement of each visible LED
    2. Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
  3. Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.

The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible

What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.

Gravity Vector matching

The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.

The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.

Partial tracking matches

The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.

With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.

While this definitely seems to help, I think the approach can use more work.

Status

With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.

I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.

For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.

Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.

After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.

,

Colin CharlesLife with Rona 2.0 – Days 4, 5, 6, 7, 8 and 9

These lack of updates are also likely because I’ve been quite caught up with stuff.

Monday I had a steak from Bay Leaf Steakhouse for dinner. It was kind of weird eating it from packs, but then I’m reminded you could do this in economy class. Tuesday I wanted to attempt to go vegetarian and by the time I was done with a workout, the only place was a chap fan shop (Leong Heng) where I had a mixture of Chinese and Indian chap fan. The Indian stall is run by an ex-Hyatt staff member who immediately recognised me! Wednesday, Alice came to visit, so we got to Hanks, got some alcohol, and managed a smorgasbord of food from Pickers/Sate Zul/Lila Wadi. Night ended very late, and on Thursday, visited Hai Tian for their famous salted egg squid and prawns in a coconut shell. Friday was back to being normal, so I grabbed a pizza from Mint Pizza (this time I tried their Aussie variant). Saturday, today, I hit up Rasa Sayang for some matcha latte, but grabbed food from Classic Pilot Cafe, which Faeeza owns! It was the famous salted egg chicken, double portion, half rice.

As for workouts, I did sign up for Mantas but found it pretty hard to do, timezone wise. I did spend a lot of time jogging on the beach (this has been almost a daily affair). Monday I also did 2 MD workouts, Tuesday 1 MD workout, Wednesday half a MD workout, Thursday I did a Ping workout at Pwrhouse (so good!), Friday 1 MD workout, and Saturday an Audrey workout at Pwrhouse and 1 MD workout.

Wednesday I also found out that Rasmus passed away. Frankly, there are no words.

Thursday, my Raspberry Pi 400 arrived. I set it up in under ten minutes, connecting it to the TV here. It “just works”. I made a video, which I should probably figure out how to upload to YouTube after I stitch it together. I have to work on using it a lot more.

COVID-19 cases are through the roof in Malaysia. This weekend we’ve seen two days of case breaking records, with today being 5,728 (yesterday was something close). Nutty. Singapore suspended the reciprocal green lane (RGL) agreement with Malaysia for the next 3 months.

I’ve managed to finish Bridgerton. I like the score. Finding something on Netflix is proving to be more difficult, regardless of having a VPN. Honestly, this is why Cable TV wins… linear programming that you’re just fed.

Stock market wise, I’ve been following the GameStop short squeeze, and even funnier is the Top Glove one, that they’re trying to repeat in Malaysia. Bitcoin seems to be doing “reasonably well” and I have to say, I think people are starting to realise decentralised services have a future. How do we get there?

What an interesting week, I look forward to more productive time. I’m still writing in my Hobonichi Techo, so at least that’s where most personal stuff ends up, I guess?

The post Life with Rona 2.0 – Days 4, 5, 6, 7, 8 and 9 first appeared on Colin Charles Agenda.

,

Jan SchmidtHitting a milestone – Beat Saber!

I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!

I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy 🙂

Check it out:

I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.

I have a list of more things to work on. See you at the next update!

,

Colin CharlesLife with Rona 2.0 – Day 3

What an unplanned day. I woke up in time to do an MD workout, despite feeling a little sore. So maybe I was about 10 minutes late and I missed the first set, but his workouts are so long, and I think there were seven sets anyway. Had a good brunch shortly thereafter.

Did a bit of reading, and then I decided to do a beach boardwalk walk… turns out they were policing the place, and you can’t hit the boardwalk. But the beach is fair game? So I went back to the hotel, dropped off my slippers, and went for a beach jog. Pretty nutty.

Came back to read a little more and figured I might as well do another MD workout. Then I headed out for dinner, trying out a new place — Mint Pizza. Opened 20.12.2020, and they’re empty, and their pizza is actually pretty good. Lamb and BBQ chicken, they did half-and-half.

Twitter was discussing Raspberry Pi’s, and all I could see is a lot of misinformation, which is truly shocking. The irony is that open source has been running the Internet for so long, and progressive web apps have come such a long way…

Back in the day when I did OpenOffice.org or Linux training even, we always did say you should learn concepts and not tools. From the time we ran Linux installfests in the late-90s in Sunway Pyramid (back then, yes, Linux was hard, and you had winmodems), but I had forgotten that I even did stuff for school teachers and NGOs back in 2002… I won’t forget PC Gemilang either…

Anyway, I placed an order again for another Raspberry Pi 400. I am certain that most people talk so much crap, without realising that Malaysia isn’t a developed nation and most people can’t afford a Mac let alone a PC. Laptops aren’t cheap. And there are so many other issues…. Saying Windows is still required in 2021 is the nuttiest thing I’ve heard in a long time. Easy to tweet, much harder to think about TCO, and realise where in the journey Malaysia is.

Maybe the best thing was that Malaysian Twitter learned about technology. I doubt many realised the difference between a Pi board vs the 400, but hey, the fact that they talked about tech is still a win (misinformed, but a win).

The post Life with Rona 2.0 – Day 3 first appeared on Colin Charles Agenda.

,

Colin CharlesLife with Rona 2.0 – Days 1 & 2

Today is the first day that in the state of Pahang, we have to encounter what many Malaysians are referring to as the Movement Control Order 2.0 (MCO 2.0). I think everyone finally agrees with the terminology that this is a lockdown now, because I remember back in the day when I was calling it that, I’d definitely offend a handful of journalists.

This is one interesting change for me compared to when I last wrote Life with RonaDay 56 of being indoors and not even leaving my household, in Kuala Lumpur. I am now not in the state, I am living in a hotel, and I am obviously moving around a little more since we have access to the beach.

KL/Selangor and several other states have already been under the MCO 2.0 since January 13 2021, and while it was supposed to end on January 26, it seems like they’ve extended and harmonised the dates for Peninsular Malaysia to end on February 4 2021. I guess everyone got the “good news” yesterday. The Prime Minister announced some kind of aid last week, but it is still mostly a joke.

Today was the 2nd day I woke up at around 2.30pm because I went to bed at around 8am. First day I had a 23.5 hour uptime, and the today was less brutal, but working from 1-8am with the PST timezone is pretty brutal. Consequently, I barely got too much done, and had one meal, vegetarian, two packs that included rice. I did get to walk by the beach (between Teluk Cempedak and Teluk Cempedak 2), did quite a bit of exercise there and I think even the monkeys are getting hungry… lots of stray cats and monkeys. Starbucks closes at 7pm, and I rocked up at 7.10pm (this was just like yesterday, when I arrived at 9.55pm and was told they wouldn’t grant me a coffee!).

While writing this entry, I did manage to get into a long video call with some friends and I guess it was good catching up with people in various states. It also is what prevented me from publishing this entry!

Day 2

I did wake up reasonable early today because I had pre-ordered room service to arrive at 9am. There is a fixed menu at the hotel for various cuisines (RM48/pax, thankfully gratis for me) and I told them I prefer not having to waste, so just give me what I want which is off menu items anyway. Roti telur double telur (yes, I know it is a roti jantan) with some banjir dhal and sambal and a bit of fruit on the side with two teh tariks. They delivered as requested. I did forget to ask for a jar of honey but that is OK, there is always tomorrow.

I spent most of the day vacillating, and wouldn’t consider it productive by any measure. Just chit chats and napping. It did rain today after a long time, so the day seemed fairly dreary.

When I finally did awaken from my nap, I went for a run on the beach. I did it barefoot. I have no idea if this is how it is supposed to be done, or if you are to run nearer the water or further up above, but I did move around between the two quite often. The beach is still pretty dead, but it is expected since no one is allowed to go unless you’re a hotel guest.

The hotel has closed 3/4 of their villages (blocks) and moved everyone to the village I’m staying in (for long stay guests…). I’m thankful I have a pretty large suite, it is a little over 980sqft, and the ample space, while smaller than my home, is still welcome.

Post beach run, I did a workout with MD via Instagram. It was strength/HIIT based, and I burnt a tonne, because he gave us one of his signature 1.5h classes. It was longer than the 80 minute class he normally charges RM50 for (I still think this is undervaluing his service, but he really does care and does it for the love of seeing his students grow!).

Post-workout I decided to head downtown to find some dinner. Everything at the Teluk Cemepdak block of shops was closed, so they’re not even bothered with doing takeaway. Sg. Lembing steakhouse seemed to have cars parked, Vanggey was empty (Crocodile Rock was open, can’t say if there was a crowd, because the shared parking lot was empty), there was a modest queue at Sate Zul, and further down, Lena was closed, Pickers was open for takeaway but looked pretty closed, Tjantek was open surprisingly, and then I thought I’d give Nusantara a try again, this time for food, but their chef had just gone home at about 8pm. Oops. So I drove to LAN burger, initially ordering just one chicken double special; however they looked like they could use the business so I added on a beef double special. They now accept Boost payments so have joined the e-wallet era. One less place to use cash, which is also why I really like Kuantan. On the drive back, Classic Pilot Cafe was also open and I guess I’ll be heading there too during this lockdown.

Came back to the room to finish both burgers in probably under 15 minutes. While watching the first episode of Bridgerton on Netflix. I’m not sure what really captivates, but I will continue on (I still haven’t finished the first episode). I need to figure out how to use the 2 TVs that I have in this room — HDMI cable? Apple TV? Not normally using a TV, all this is clearly more complex than I care to admit.

I soaked longer than expected, ended up a prune, but I’m sure it will give me good rest!

One thought to leave with:

“Learn to enjoy every minute of your life. Be happy now. Don’t wait for something outside of yourself to make you happy in the future.” — Earl Nightingale

The post Life with Rona 2.0 – Days 1 & 2 first appeared on Colin Charles Agenda.

,

Sam WatkinsDeveloping CZ, a dialect of C that looks like Python

In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.

Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.

My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.

The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.

C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).

At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.

CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.

I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.

CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.

One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.

CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.

I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.

For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).

The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.

CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.

I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!

CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.

Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.

#!/usr/local/bin/cz --
use b
use ccomplex

Main:
	num outside = 16, ox = -0.5, oy = 0, r = 1.5
	long i, max_i = 50, rb_i = 30
	space()
	uint32_t *px = pixel()  # CONFIGURE!
	num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
	for(y, 0, h):
		cmplx c = x0 + (y0-d*y)*I
		repeat(w):
			cmplx w = c
			for i=0; i < max_i && cabs(w) < outside; ++i
				w = w*w + c
			*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
			c += d

I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.

An image from the Mandelbrot set, generated by a fairly simple CZ program.

I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.

Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:

+add, subtract, disjoint union, zero
×multiply, divide, cartesian product, one
^power, root, logarithm
sin, cos, sin-1, cos-1, hypot, atan2
δdifferential, integral
a set of minimal relational operators for elementary math

I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.

CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.

Thanks for reading. I wrote this article during the process of applying to join Toptal, which appears to be a freelancing portal for top developers; and in response to this article on toptal: After All These Years, the World is Still Powered by C Programming.

My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!

Gary PendergastWordPress Importers: Free (as in Speech)

Back at the start of this series, I listed four problems within the scope of the WordPress Importers that we needed to address. Three of them are largely technical problems, which I covered in previous posts. In wrapping up this series, I want to focus exclusively on the fourth problem, which has a philosophical side as well as a technical one — but that does not mean we cannot tackle it!

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

Some services are merely inconvenient: they provide exports, but it often involves downloading a bunch of different files. Your CMS content is in one export, your store products are in another, your orders are in another, and your mailing list is in yet another. It’s not ideal, but they at least let you get a copy of your data.

However, there’s another class of services that actively work against their customers. It’s these services I want to focus on: the services that don’t provide any ability to export your content — effectively locking people in to using their platform. We could offer these folks an escape! The aim isn’t to necessarily make them use WordPress, it’s to give them a way out, if they want it. Whether they choose to use WordPress or not after that is immaterial (though I certainly hope they would, of course). The important part is freedom of choice.

It’s worth acknowledging that this is a different approach to how WordPress has historically operated in relation to other CMSes. We provide importers for many CMSes, but we previously haven’t written exporters. However, I don’t think this is a particularly large step: for CMSes that already provide exports, we’d continue to use those export files. This is focussed on the few services that try to lock their customers in.

Why Should WordPress Take This On?

There are several aspects to why we should focus on this.

First of all, it’s the the WordPress mission. Underpinning every part of WordPress is the simplest of statements:

Democratise Publishing

The freedom to build. The freedom to change. The freedom to share.

These freedoms are the pillars of a Free and Open Web, but they’re not invulnerable: at times, they need to be defended, and that needs people with the time and resources to offer a defence.

Which brings me to my second point: WordPress has the people who can offer that defence! The WordPress project has so many individuals working on it, from such a wide variety of backgrounds, we’re able to take on a vast array of projects that a smaller CMS just wouldn’t have the bandwidth for. That’s not to say that we can do everything, but when there’s a need to defend the entire ecosystem, we’re able to devote people to the cause.

Finally, it’s important to remember that WordPress doesn’t exist in a vacuum, we’re part of a broad ecosystem which can only exist through the web remaining open and free. By encouraging all CMSes to provide proper exports, and implementing them for those that don’t, we help keep our ecosystem healthy.

We have the ability to take on these challenges, but we have a responsibility that goes alongside. We can’t do it solely to benefit WordPress, we need to make that benefit available to the entire ecosystem. This is why it’s important to define a WordPress export schema, so that any CMS can make use of the export we produce, not just WordPress. If you’ll excuse the imagery for a moment, we can be the knight in shining armour that frees people — then gives them the choice of what they do with that freedom, without obligation.

How Can We Do It?

Moving on to the technical side of this problem, I can give you some good news: the answer is definitely not screen scraping. 😄 Scraping a site is fragile, impossible to transform into the full content, and provides an incomplete export of the site: anything that’s only available in the site dashboard can’t be obtained through scraping.

I’ve recently been experimenting with an alternative approach to solving this problem. Rather than trying to create something resembling a traditional exporter, it turns out that modern CMSes provide the tools we need, in the form of REST APIs. All we need to do is call the appropriate APIs, and collate the results. The fun part is that we can authenticate with these APIs as the site owner, by calling them from a browser extension! So, that’s what I’ve been experimenting with, and it’s showing a lot of promise.

If you’re interested in playing around with it, the experimental code is living in this repository. It’s a simple proof of concept, capable of exporting the text content of a blog on a Wix site, showing that we can make a smooth, comprehensive, easy-to-use exporter for any Wix site owner.

Screenshot of the "Free (as in Speech)" browser extension UI.

Clicking the export button starts a background script, which calls Wix’s REST APIs as the site owner, to get the original copy of the content. It then packages it up, and presents it as a WXR file to download.

Screenshot of a Firefox download dialog, showing a Wix site packaged up as a WXR file.

I’m really excited about how promising this experiment is. It can ultimately provide a full export of any Wix site, and we can add support for other CMS services that choose to artificially lock their customers in.

Where Can I Help?

If you’re a designer or developer who’s excited about working on something new, head on over to the repository and check out the open issues: if there’s something that isn’t already covered, feel free to open a new issue.

Since this is new ground for a WordPress project, both technically and philosophically, I’d love to hear more points of view. It’s being discussed in the WordPress Core Dev Chat this week, and you can also let me know what you think in the comments!

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Gary PendergastWordPress Importers: Defining a Schema

While schemata are usually implemented using language-specific tools (eg, XML uses XML Schema, JSON uses JSON Schema), they largely use the same concepts when talking about data. This is rather helpful, we don’t need to make a decision on data formats before we can start thinking about how the data should be arranged.

Note: Since these concepts apply equally to all data formats, I’m using “WXR” in this post as shorthand for “the structured data section of whichever file format we ultimately use”, rather than specifically referring to the existing WXR format. 🙂

Why is a Schema Important?

It’s fair to ask why, if the WordPress Importers have survived this entire time without a formal schema, why would we need one now?

There are two major reasons why we haven’t needed one in the past:

  • WXR has remained largely unchanged in the last 10 years: there have been small additions or tweaks, but nothing significant. There’s been no need to keep track of changes.
  • WXR is currently very simple, with just a handful of basic elements. In a recent experiment, I was able to implement a JavaScript-based WXR generator in just a few days, entirely by referencing the Core implementation.

These reasons are also why it would help to implement a schema for the future:

  • As work on WXR proceeds, there will likely need to be substantial changes to what data is included: adding new fields, modifying existing fields, and removing redundant fields. Tracking these changes helps ensure any WXR implementations can stay in sync.
  • These changes will result in a more complex schema: relying on the source to re-implement it will become increasingly difficult and error-prone. Following Gutenberg’s lead, it’s likely that we’d want to provide official libraries in both PHP and JavaScript: keeping them in sync is best done from a source schema, rather than having one implementation copy the other.

Taking the time to plan out a schema now gives us a solid base to work from, and it allows for future changes to happen in a reliable fashion.

WXR for all of WordPress

With a well defined schema, we can start to expand what data will be included in a WXR file.

Media

Interestingly, many of the challenges around media files are less to do with WXR, and more to do with importer capabilities. The biggest headache is retrieving the actual files, which the importer currently handles by trying to retrieve the file from the remote server, as defined in the wp:attachment_url node. In context, this behaviour is understandable: 10+ years ago, personal internet connections were too slow to be moving media around, it was better to have the servers talk to each other. It’s a useful mechanism that we should keep as a fallback, but the more reliable solution is to include the media file with the export.

Plugins and Themes

There are two parts to plugins and themes: the code, and the content. Modern WordPress sites require plugins to function, and most are customised to suit their particular theme.

For exporting the code, I wonder if a tiered solution could be applied:

  • Anything from WordPress.org would just need their slug, since they can be re-downloaded during import. Particularly as WordPress continues to move towards an auto-updated future, modified versions of plugins and themes are explicitly not supported.
  • Third party plugins and themes would be given a filter to use, where they can provide a download URL that can be included in the export file.
  • Third party plugins/themes that don’t provide a download URL would either need to be skipped, or zipped up and included in the export file.

For exporting the content, WXR already includes custom post types, but doesn’t include custom settings, or custom tables. The former should be included automatically, and the latter would likely be handled by an appropriate action for the plugin to hook into.

Settings

There are a currently handful of special settings that are exported, but (as I just noted, particularly with plugins and themes being exported) this would likely need to be expanded to included most items in wp_options.

Users

Currently, the bare minimum information about users who’ve authored a post is included in the export. This would need to be expanded to include more user information, as well as users who aren’t post authors.

WXR for parts of WordPress

The modern use case for importers isn’t just to handle a full site, but to handle keeping sites in sync. For example, most news organisations will have a staging site (or even several layers of staging!) which is synchronised to production.

While it’s well outside the scope of this project to directly handle every one of these use cases, we should be able to provide the framework for organisations to build reliable platforms on. Exports should be repeatable, objects in the export should have unique identifiers, and the importer should be able to handle any subset of WXR.

WXR Beyond WordPress

Up until this point, we’ve really been talking about WordPress→WordPress migrations, but I think WXR is a useful format beyond that. Instead of just containing direct exports of the data from particular plugins, we could also allow it to contain “types” of data. This turns WXR into an intermediary language, exports can be created from any source, and imported into WordPress.

Let’s consider an example. Say we create a tool that can export a Shopify, Wix, or GoDaddy site to WXR, how would we represent an online store in the WXR file? We don’t want to export in the format that any particular plugin would use, since a WordPress Core tool shouldn’t be advantaging one plugin over others.

Instead, it would be better if we could format the data in a platform-agnostic way, which plugins could then implement support for. As luck would have it, Schema.org provides exactly the kind of data structure we could use here. It’s been actively maintained for nearly nine years, it supports a wide variety of data types, and is intentionally platform-agnostic.

Gazing into my crystal ball for a moment, I can certainly imagine a future where plugins could implement and declare support for importing certain data types. When handling such an import (assuming one of those plugins wasn’t already installed), the WordPress Importer could offer them as options during the import process. This kind of seamless integration allows WordPress to show that it offers the same kind of fully-featured site building experience that modern CMS services do.

Of course, reality is never quite as simple as crystal balls and magic wands make them out to be. We have to contend with services that provide incomplete or fragmented exports, and there are even services that deliberately don’t provide exports at all. In the next post, I’ll be writing about why we should address this problem, and how we might be able to go about it.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Gary PendergastWordPress Importers: Getting Our House in Order

The previous post talked about the broad problems we need to tackle to bring our importers up to speed, making them available for everyone to use.

In this post, I’m going to focus on what we could do with the existing technology, in order to give us the best possible framework going forward.

A Reliable Base

Importers are an interesting technical problem. Much like you’d expect from any backup/restore code, importers need to be extremely reliable. They need to comfortable handle all sorts of unusual data, and they need to keep it all safe. Particularly considering their age, the WordPress Importers do a remarkably good job of handling most content you can throw at it.

However, modern development practices have evolved and improved since the importers were first written, and we should certainly be making use of such practices, when they fit with our requirements.

For building reliable software that we expect to largely run by itself, a variety of comprehensive automated testing is critical. This ensures we can confidently take on the broader issues, safe in the knowledge that we have a reliable base to work from.

Testing must be the first item on this list. A variety of automated testing gives us confidence that changes are safe, and that the code can continue to be maintained in the future.

Data formats must be well defined. While this is useful for ensuring data can be handled in a predictable fashion, it’s also a very clear demonstration of our commitment to data freedom.

APIs for creating or extending importers should be straightforward for hooking into.

Performance Isn’t an Optional Extra

With sites constantly growing in size (and with the export files potentially gaining a heap of extra data), we need to care about the performance of the importers.

Luckily, there’s already been some substantial work done on this front:

There are other groups in the WordPress world who’ve made performance improvements in their own tools: gathering all of that experience is a relatively quick way to bring in production-tested improvements.

The WXR Format

It’s worth talking about the WXR format itself, and determining whether it’s the best option for handling exports into the future. XML-based formats are largely viewed as a relic of days gone past, so (if we were to completely ignore backwards compatibility for a moment) is there a modern data format that would work better?

The short answer… kind of. 🙂

XML is actually well suited to this use case, and (particularly when looking at performance improvements) is the only data format for which PHP comes with a built-in streaming parser.

That said, WXR is basically an extension of the RSS format: as we add more data to the file that clearly doesn’t belong in RSS, there is likely an argument for defining an entirely WordPress-focused schema.

Alternative Formats

It’s important to consider what the priorities are for our export format, which will help guide any decision we make. So, I’d like to suggest the following priorities (in approximate priority order):

  • PHP Support: The format should be natively supported in PHP, thought it is still workable if we need to ship an additional library.
  • Performant: Particularly when looking at very large exports, it should be processed as quickly as possible, using minimal RAM.
  • Supports Binary Files: The first comments on my previous post asked about media support, we clearly should be treating it as a first-class citizen.
  • Standards Based: Is the format based on a documented standard? (Another way to ask this: are there multiple different implementations of the format? Do those implementations all function the same?
  • Backward Compatible: Can the format be used by existing tools with no changes, or minimal changes?
  • Self Descriptive: Does the format include information about what data you’re currently looking at, or do you need to refer to a schema?
  • Human Readable: Can the file be opened and read in a text editor?

Given these priorities, what are some options?

WXR (XML-based)

Either the RSS-based schema that we already use, or a custom-defined XML schema, the arguments for this format are pretty well known.

One argument that hasn’t been well covered is how there’s a definite trade-off when it comes to supporting binary files. Currently, the importer tries to scrape the media file from the original source, which is not particularly reliable. So, if we were to look at including media files in the WXR file, the best option for storing them is to base64 encode them. Unfortunately, that would have a serious effect on performance, as well as readability: adding huge base64 strings would make even the smallest exports impossible to read.

Either way, this option would be mostly backwards compatible, though some tools may require a bit of reworking if we were to substantial change the schema.

WXR (ZIP-based)

To address the issues with media files, an alternative option might be to follow the path that Microsoft Word and OpenOffice use: put the text content in an XML file, put the binary content into folders, and compress the whole thing.

This addresses the performance and binary support problems, but is initially worse for readability: if you don’t know that it’s a ZIP file, you can’t read it in a text editor. Once you unzip it, however, it does become quite readable, and has the same level of backwards compatibility as the XML-based format.

JSON

JSON could work as a replacement for XML in both of the above formats, with one additional caveat: there is no streaming JSON parser built in to PHP. There are 3rd party libraries available, but given the documented differences between JSON parsers, I would be wary about using one library to produce the JSON, and another to parse it.

This format largely wouldn’t be backwards compatible, though tools which rely on the export file being plain text (eg, command line tools to do broad search-and-replaces on the file) can be modified relatively easily.

There are additional subjective arguments (both for and against) the readability of JSON vs XML, but I’m not sure there’s anything to them beyond personal preference.

SQLite

The SQLite team wrote an interesting (indirect) argument on this topic: OpenOffice uses a ZIP-based format for storing documents, the SQLite team argued that there would be benefits (particularly around performance and reliability) for OpenOffice to switch to SQLite.

They key issues that I see are:

  • SQLite is included in PHP, but not enabled by default on Windows.
  • While the SQLite team have a strong commitment to providing long-term support, SQLite is not a standard, and the only implementation is the one provided by the SQLite team.
  • This option is not backwards compatible at all.

FlatBuffers

FlatBuffers is an interesting comparison, since it’s a data format focussed entirely on speed. The down side of this focus is that it requires a defined schema to read the data. Much like SQLite, the only standard for FlatBuffers is the implementation. Unlike SQLite, FlatBuffers has made no commitments to providing long-term support.

WXR (XML-based)WXR (ZIP-based)JSONSQLiteFlatBuffers
Works in PHP?✅✅⚠⚠⚠
Performant?⚠✅⚠✅✅
Supports Binary Files?⚠✅⚠✅✅
Standards Based?✅✅✅⚠ / ��
Backwards Compatible?⚠⚠���
Self Descriptive?✅✅✅✅�
Readable?✅⚠ / �✅��

As with any decision, this is a matter of trade-offs. I’m certainly interested in hearing additional perspectives on these options, or thoughts on options that I haven’t considered.

Regardless of which particular format we choose for storing WordPress exports, every format should have (or in the case of FlatBuffers, requires) a schema. We can talk about schemata without going into implementation details, so I’ll be writing about that in the next post.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

Gary PendergastWordPress Importers: Stating the Problem

It’s time to focus on the WordPress Importers.

I’m not talking about tidying them up, or improve performance, or fixing some bugs, though these are certainly things that should happen. Instead, we need to consider their purpose, how they fit as a driver of WordPress’ commitment to Open Source, and how they can be a key element in helping to keep the Internet Open and Free.

The History

The WordPress Importers are arguably the key driver to WordPress’ early success. Before the importer plugins existed (before WordPress even supported plugins!) there were a handful of import-*.php scripts in the wp-admin directory that could be used to import blogs from other blogging platforms. When other platforms fell out of favour, WordPress already had an importer ready for people to move their site over. One of the most notable instances was in 2004, when Moveable Type changed their license and prices, suddenly requiring personal blog authors to pay for something that had previously been free. WordPress was fortunate enough to be in the right place at the right time: many of WordPress’ earliest users came from Moveable Type.

As time went on, WordPress became well known in its own right. Growth relied less on people wanting to switch from another provider, and more on people choosing to start their site with WordPress. For practical reasons, the importers were moved out of WordPress Core, and into their own plugins. Since then, they’ve largely been in maintenance mode: bugs are fixed when they come up, but since export formats rarely change, they’ve just continued to work for all these years.

An unfortunate side effect of this, however, is that new importers are rarely written. While a new breed of services have sprung up over the years, the WordPress importers haven’t kept up.

The New Services

There are many new CMS services that have cropped up in recent years, and we don’t have importers for any of them. WordPress.com has a few extra ones written, but they’ve been built on the WordPress.com infrastructure out of necessity.

You see, we’ve always assumed that other CMSes will provide some sort of export file that we can use to import into WordPress. That isn’t always the case, however. Some services (notable, Wix and GoDaddy Website Builder) deliberately don’t allow you to export your own content. Other services provide incomplete or fragmented exports, needlessly forcing stress upon site owners who want to use their own content outside of that service.

To work around this, WordPress.com has implemented importers that effectively scrape the site: while this has worked to some degree, it does require regular maintenance, and the importer has to do a lot of guessing about how the content should be transformed. This is clearly not a solution that would be maintainable as a plugin.

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

This strikes at the heart of the WordPress Bill of Rights. WordPress is built with fundamental freedoms in mind: all of those freedoms point to owning your content, and being able to make use of it in any form you like. When a CMS actively works against providing such freedom to their community, I would argue that we have an obligation to help that community out.

A Variety of Content

It’s worth discussing how, when starting a modern CMS service, the bar for success is very high. You can’t get away with just providing a basic CMS: you need to provide all the options. Blogs, eCommerce, mailing lists, forums, themes, polls, statistics, contact forms, integrations, embeds, the list goes on. The closest comparison to modern CMS services is… the entire WordPress ecosystem: built on WordPress core, but with the myriad of plugins and themes available, along with the variety of services offered by a huge array of companies.

So, when we talk about the importers, we need to consider how they’ll be used.

Problem Number 3

To import from a modern CMS service into WordPress, your importer needs to map from service features to WordPress plugins.

Getting Our Own House In Order

Some of these problems don’t just apply to new services, however.

Out of the box, WordPress exports to WXR (WordPress eXtended RSS) files: an XML file that contains the content of the site. Back when WXR was first created, this was all you really needed, but much like the rest of the WordPress importers, it hasn’t kept up with the times. A modern WordPress site isn’t just the sum of its content: a WordPress site has plugins and themes. It has various options configured, it has huge quantities of media, it has masses of text content, far more than the first WordPress sites ever had.

Problem Number 2

WXR doesn’t contain a full export of a WordPress site.

In my view, WXR is a solid format for handling exports. An XML-based system is quite capable of containing all forms of content, so it’s reasonable that we could expand the WXR format to contain the entire site.

Built for the Future

If there’s one thing we can learn from the history of the WordPress importers, it’s that maintenance will potentially be sporadic. Importers are unlikely to receive the same attention that the broader WordPress Core project does, owners may come and go. An importer will get attention if it breaks, of course, but it otherwise may go months or years without changing.

Problem Number 1

We can’t depend on regular importer maintenance in the future.

It’s quite possible to build code that will be running in 10+ years: we see examples all across the WordPress ecosystem. Doing it in a reliable fashion needs to be a deliberate choice, however.

What’s Next?

Having worked our way down from the larger philosophical reasons for the importers, to some of the more technically-oriented implementation problems; I’d like to work our way back out again, focussing on each problem individually. In the following posts, I’ll start laying out how I think we can bring our importers up to speed, prepare them for the future, and make them available for everyone.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Glen TurnerCompiling and installing software for the uBITX v6 QRP amateur radio transciever

The uBITX uses an Arduino internally. This article describes how to update its software.

Required hardware

The connector on the back is a Mini-B USB connector, so you'll need a "Mini-B to A" USB cable. This is not the same cable as used with older Android smartphones. The Mini-B connector was used with a lot of cameras a decade ago.

You'll also need a computer. I use a laptop with Fedora Linux installed.

Required software for software development

In Fedora all the required software is installed with sudo dnf install arduino git. Add yourself to the users and lock groups with sudo usermod -a -G users,lock $USER (on Debian-style systems use sudo usermod -a -G dialout,lock $USER). You'll need to log out and log in again for that to have an effect (if you want to see which groups you are already in, then use the id command).

Run arduino as your ordinary non-root user to create the directories used by the Arduino IDE. You can quit the IDE once it starts.

Obtain the uBITX software

$ cd ~/Arduino
$ git clone https://github.com/afarhan/ubitxv6.git ubitx_v6.1_code

Connect the uBITX to your computer

Plug in the USB cable and turn on the radio. Running dmesg will show the Arduino appearing as a "USB serial" device:

usb 1-1: new full-speed USB device number 6 using xhci_hcd
usb 1-1: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
usb 1-1: Product: USB Serial
usbcore: registered new interface driver ch341
usbserial: USB Serial support registered for ch341-uart
ch341 1-1:1.0: ch341-uart converter detected
usb 1-1: ch341-uart converter now attached to ttyUSB1

If you want more information about the USB device then use:

$ lsusb -d 1a86:7523
Bus 001 Device 006: ID 1a86:7523 QinHeng Electronics CH340 serial converter


comment count unavailable comments

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

A Quick Recap

When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

Integration in OpenHMD

When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

Headset motion capture trace

If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

Headset motion capture – zoomed in view

This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

Losing Track

The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

That looks more like this

Headset motion capture trace with tracking errors

This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

Latency Compensation

There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

Here’s a good example of the difference:

Before: Position filtering with no latency compensation
After: Latency-compensated position reports

Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

There are some other possible approaches to this problem:

  • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
  • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
  • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
  • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

Controllers

So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

Oculus Camera view of headset and left controller.

This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

Position tracking of left controller with lots of tracking loss.

Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

Timing Improvements

One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

Sponsorships

All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

Next Steps

The next things on my todo list are:

  • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
  • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
  • Publish the UKF branch for other people to try.
  • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
  • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

,

Tim SerongScope Creep

On December 22, I decided to brew an oatmeal stout (5kg Gladfield ale malt, 250g dark chocolate malt, 250g light chocolate malt, 250g dark crystal malt, 500g rolled oats, 150g rice hulls to stop the mash sticking, 25g Pride of Ringwood hops, Safale US-05 yeast). This all takes a good few hours to do the mash and the boil and everything, so while that was underway I thought it’d be a good opportunity to remove a crappy old cupboard from the laundry, so I could put our nice Miele upright freezer in there, where it’d be closer to the kitchen (the freezer is presently in a room at the other end of the house).

The cupboard was reasonably easy to rip out, but behind it was a mouldy and unexpectedly bright yellow wall with an ugly gap at the top where whoever installed it had removed the existing cornice.

Underneath the bottom half of the cupboard, I discovered not the cork tiles which cover the rest of the floor, but a layer of horrific faux-tile linoleum. Plus, more mould. No way was I going to put the freezer on top of that.

So, up came the floor covering, back to nice hardwood boards.

Of course, the sink had to come out too, to remove the flooring from under its cabinet, and that meant pulling the splashback tiles (they had ugly screw holes in them anyway from a shelf that had been bracketed up on top of them previously).

Removing the tiles meant replacing a couple of sections of wall.

Also, we still needed to be able to use the washing machine through all this, so I knocked up a temporary sink support.

New cornice went in.

The rest of the plastering was completed and a ceiling fan installed.

Waterproofing membrane was applied where new tiles will go around a new sink.

I removed the hideous old aluminium backed weather stripping from around the exterior door and plastered up the exposed groove.

We still need to paint everything, get the new sink installed, do the tiling work and install new taps.

As for the oatmeal stout, I bottled that on January 2. From a sample taken at the time, it should be excellent, but right now still needs to carbonate and mature.

Stewart SmithPhotos from Taiwan

A few years ago we went to Taiwan. I managed to capture some random bits of the city on film (and also some shots on my then phone, a Google Pixel). I find the different style of art on the streets around the world to be fascinating, and Taiwan had some good examples.

I’ve really enjoyed shooting Kodak E100VS film over the years, and some of my last rolls were shot in Taiwan. It’s a film that unfortunately is not made anymore, but at least we have a new Ektachrome to have fun with now.

Words for our time: “Where there is democracy, equality and freedom can exist; without democracy, equality and freedom are merely empty words”.

This is, of course, only a small number of the total photos I took there. I’d really recommend a trip to Taiwan, and I look forward to going back there some day.

,

Colin CharlesCiao, 2020

Another year comes to a close, and this is the 4th year running I’m in Kuala Lumpur — 2017, 2018, 2019, and 2020… Wow. Maybe the biggest difference is that I’ve been in Malaysia for 306 days, thanks to the novel coronavirus. I have never spent this much time in Malaysia, in my entire life… I want to say KL, but I’ve managed to zip my way around to Kuantan (a lot), Penang, and Malacca. I can’t believe I flew back on February 29 2020 from Tokyo, and never got on a plane again! What a grounded globalist I’ve become.

My travel stats are of course, pretty dismal. 39 days out of the country. Apparently I did a total of 13 trips, 92 days of travel (I don’t know if all my local trips are counted frankly), 60,766km, 17 cities, and still 7 countries :) I don’t even want to compare to what it was like in 2019.

I ended that by saying, “I welcome 2020 with arms wide open.”. I’m not so sure how I feel about 2020. There is life beyond travel. COVID and our reaction to it, really worries me.

KL has some pretty good food. Kuantan has some pretty good people. While in KL, I visited a spin studio at least once per day. I did a total of 272 spin classes over 366 days! Not to forget there was 56 days of complete lockdown, and studios didn’t open till about maybe mid-June… Sure I did do some spin in London and Paris too, but the bulk of all this happened while I was here in KL.

I became reasonably friendlier, I became vulnerable, and like every time you do that, you’re chances of happiness and getting hurt probably straddle 50:50. Madonna – The Power of Good-bye can be apt.

This is not to say I didn’t enjoy 2020. Glass half full. I really did. Carpe diem. Simplicity is best. If you can follow KISS principles in engineering, why would you pour your entire thought process out and overwhelm the other party?

Anyway, I still look forward to 2021, with wide open arms, and while I really do think the COVID mess isn’t going away and things are going to be worse for many, I will still be focused on the most positive aspects of 2021. And I’ll work on being my old self again ;-)

I also ended the year with a haircut (number 1/0.5 on the sides) on Monday 28 December 2020. Somewhat of an experiment (does CoQ10 help speed up hair growth?) but also somewhat of a reaction to saying goodbye to December 2020.

The post Ciao, 2020 first appeared on Colin Charles Agenda.

,

Tim SerongI Have No Idea How To Debug This

On my desktop system, I’m running XFCE on openSUSE Tumbleweed. When I leave my desk, I hit the “lock screen” button, the screen goes black, and the monitors go into standby. So far so good. When I come back and mash the keyboard, everything lights up again, the screens go white, and it says:

blank: Shows nothing but a black screen
Name: tserong@HOSTNAME
Password:
Enter password to unlock; select icon to lock

So I type my password, hit ENTER, and I’m back in action. So far so good again. Except… Several times recently, when I’ve come back and mashed the keyboard, the white overlay is gone. I can see all my open windows, my mail client, web browser, terminals, everything, but the screen is still locked. If I type my password and hit ENTER, it unlocks and I can interact again, but this is where it gets really weird. All the windows have moved down a bit on the screen. For example, a terminal that was previously neatly positioned towards the bottom of the screen is now partially off the screen. So “something” crashed – whatever overlay the lock thingy put there is gone? And somehow this affected the position of all my application windows? What in the name of all that is good and holy is going on here?

Update 2020-12-21: I’ve opened boo#1180241 to track this.

,

Stewart SmithTwo Photos from Healseville Sanctuary

If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

Leah trying to photograph a much too close bird
Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

Stewart SmithPhotos from Adelaide

Some shots on Kodak Portra 400 from Adelaide. These would have been shot with my Nikon F80 35mm body, I think all with the 50mm lens. These are all pre-pandemic, and I haven’t gone and looked up when exactly. I’m just catching up on scanning some negatives.

,

Glen TurnerBlocking a USB device

udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"


comment count unavailable comments

,

Hamish TaylorWattlebird feeding

While I hope to update this site again soon, here’s a photo I captured over the weekend in my back yard. The red flowering plant is attracting wattlebirds and honey-eaters. This wattlebird stayed still long enough for me to take this shot. After a little bit of editing, I think it has turned out rather well.

Photo taken with: Canon 7D Mark II & Canon 55-250mm lens.

Edited in Lightroom and Photoshop (to remove a sun glare spot off the eye).

Wattlebird feeding

Gary PendergastMore than 280 characters

It’s hard to be nuanced in 280 characters.

The Twitter character limit is a major factor of what can make it so much fun to use: you can read, publish, and interact, in extremely short, digestible chunks. But, it doesn’t fit every topic, ever time. Sometimes you want to talk about complex topics, having honest, thoughtful discussions. In an environment that encourages hot takes, however, it’s often easier to just avoid having those discussions. I can’t blame people for doing that, either: I find myself taking extended breaks from Twitter, as it can easily become overwhelming.

For me, the exception is Twitter threads.

Twitter threads encourage nuance and creativity.

Creative masterpieces like this Choose Your Own Adventure are not just possible, they rely on Twitter threads being the way they are.

Publishing a short essay about your experiences in your job can bring attention to inequality.

And Tumblr screenshot threads are always fun to read, even when they take a turn for the epic (over 4000 tweets in this thread, and it isn’t slowing down!)

Everyone can think of threads that they’ve loved reading.

My point is, threads are wildly underused on Twitter. I think I big part of that is the UI for writing threads: while it’s suited to writing a thread as a series of related tweet-sized chunks, it doesn’t lend itself to writing, revising, and editing anything more complex.

To help make this easier, I’ve been working on a tool that will help you publish an entire post to Twitter from your WordPress site, as a thread. It takes care of transforming your post into Twitter-friendly content, you can just… write. 🙂

It doesn’t just handle the tweet embeds from earlier in the thread: it handles handle uploading and attaching any images and videos you’ve included in your post.

All sorts of embeds work, too. 😉

It’ll be coming in Jetpack 9.0 (due out October 6), but you can try it now in the latest Jetpack Beta! Check it out and tell me what you think. 🙂

This might not fix all of Twitter’s problems, but I hope it’ll help you enjoy reading and writing on Twitter a little more. 💖

,

Glen TurnerConverting MPEG-TS to, well, MPEG

Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


comment count unavailable comments

,

Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

@apply

def apply(transform):

    def __decorator__(using_this):
        return transform(using_this)

    return __decorator__


numbers = [1, 2, 3, 4, 5]

@apply(lambda f: list(map(f, numbers)))
def squares(i):
  return i * i

print(list(squares))

# prints: [1, 4, 9, 16, 25]

Init.java

public class Init {
  public static void main(String[] args) {
    System.out.println("Hello, World!")
  }
}

@switch and @case

__NOT_A_MATCHER__ = object()
__MATCHER_SORT_KEY__ = 0

def switch(cls):

    inst = cls()
    methods = []

    for attr in dir(inst):
        method = getattr(inst, attr)
        matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)

        if matcher == __NOT_A_MATCHER__:
            continue

        methods.append(method)

    methods.sort(key = lambda i: i.__matcher_sort_key__)

    for method in methods:
        matches = method.__matcher__()
        if matches:
            return method()

    raise ValueError(f"No matcher matches value {test_value}")

def case(matcher):

    def __decorator__(f):
        global __MATCHER_SORT_KEY__

        f.__matcher__ = matcher
        f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
        __MATCHER_SORT_KEY__ += 1
        return f

    return __decorator__



if __name__ == "__main__":
    for i in range(100):

        @switch
        class FizzBuzz:

            @case(lambda: i % 15 == 0)
            def fizzbuzz(self):
                return "fizzbuzz"

            @case(lambda: i % 3 == 0)
            def fizz(self):
                return "fizz"

            @case(lambda: i % 5 == 0)
            def buzz(self):
                return "buzz"

            @case(lambda: True)
            def default(self):
                return "-"

        print(f"{i} {FizzBuzz}")

,

Colin CharlesLinks on Rona #2

This was easily a late April 2020 roundup, stuck in BBEdit, which may still be vaguely relevant.

The post Links on Rona #2 first appeared on Colin Charles Agenda.

,

Craig SandersFuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

,

Hamish TaylorBlog: A new beginning

Earlier today I launched this site. It is the result of a lot of work over the past few weeks. It began as an idea to publicise some of my photos, and morphed into the site you see now, including a store and blog that I’ve named “Photekgraddft”.

In the weirdly named blog, I want to talk about photography, the stories behind some of my more interesting shots, the gear and software I use, my technology career, my recent ADHD diagnosis and many other things.

This scares me quite a lot. I’ve never really put myself out onto the internet before. If you Google me, you’re not going to find anything much. Google Images has no photos of me. I’ve always liked it that way. Until now.

ADHD’ers are sometimes known for “oversharing”, one of the side-effects of the inability to regulate emotions well. I’ve always been the opposite, hiding, because I knew I was different, but didn’t understand why.

The combination of the COVID-19 pandemic and my recent ADHD diagnosis have given me a different perspective. I now know why I hid. And now I want to engage, and be engaged, in the world.

If I can be a force for positive change, around people’s knowledge and opinion of ADHD, then I will.

If talking about Business Analysis (my day job), and sharing my ideas for optimising organisations helps anyone at all, then I will.

If I can show my photos and brighten someone’s day by allowing them to enjoy a sunset, or a flying bird, then I will.

And if anyone buys any of my photos, then I will be shocked!

So welcome to my little vanity project. I hope it can be something positive, for me, if for noone else in this new, odd world in which we now find ourselves living together.

,

,

,

Matt PalmerPrivate Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

,

Jonathan Adamczewskif32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

,

Chris NeugebauerReflecting on 10 years of not having to update WordPress

Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

How to migrate a Debian WordPress site to Jekyll

Should you find the Jekyll exporter not working on your Debian WordPress install:

  1. Use the standard WordPress export to export an XML feel of your site.
  2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
  3. Import the exported XML feed.
  4. Install the Jekyll exporter plugin.
  5. Follow the documentation and receive a Jekyll export of your site.

Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

,

Gary PendergastInstall the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

,

Andrew RuthvenInstall Fedora CoreOS using FAI

I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

A few interesting aspects I encountered:

  1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
  2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
  3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
  4. The time to do an installation was incredibly long.
  5. The initrd image is really just running coreos-installer against the nominated device.

During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

,

Gary PendergastBebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

,

Robert CollinsStrength training from home

For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

I’ve gotten most of my information from a few different youtube channels:

There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

Good luck, train well!

Brendan ScottCovid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

,

Ian BrownKubernetes Secrets Security

I was always of the belief that secrets within Kubernetes (k8s) are secure. How wrong I was! After a recent meetup featuring a Google security expert, I discovered that the secrets I have in our k8s cluster are no more secure than writing them down on paper and leaving it on a park bench. Google Cloud Platform is a fantastic offering, and yes, I am biased. I have used every other cloud platform from every major vendor over the last 10 odd years of my career.

,

Clinton Roylca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

,

sthbrx - a POWER technical bloglinux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.