I got married. I moved into the flat. Companies have gone up and down, like life.
170 days on the road, 308,832km travelled, 38 cities, and 16 countries. I have never travelled this little in recent life, but maybe the whole getting married thing (planning a wedding is no mean feat), and sorting the flat out (dealing with incompetent interior designers, sorting things there, etc.), caused this?
It is 2025, and I’m actually planted in Kuala Lumpur, not having done an end of year trip, to usher in the New Year somewhere else. I started the year in Paris, and I ended the year in Kuala Lumpur, tired, and maybe a bit burnt out.
Working hard to get back into the grind; don’t get me wrong, I’ve been doing nothing but grinding, but c’est la vie.
A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.
Since about 2017, a group at Cisco has been working on an “OCI native operating system” under the title “project machine”, which is a terrible project name. I note that most of the people publicly involved in the project according to github commits no longer work at Cisco, so I cannot vouch for the health of the overall project. That said, they did come up with some interesting ideas along the way and given its a quiet time of year I figured I could do some reading.
Firstly, Docker / OCI images store their layers as tar files. This is quite inefficient, as the tar file format itself is really intended for ancient tape drives, doesn’t support concepts such as random seeks, isn’t particularly well defined (there are a few competing implementations), and generally wasn’t intended for these things. So instead, that team wrote atomfs, which stores the layers as squashfs filesystems. It should be noted that the only container runtime which appears to actually support atomfs is the project machine itself, so its not a super useful format in the real world.
Secondly, the team appears to have fairly rapidly moved on to puzzlefs instead of persuing full support for atoms in upstream runtimes. Puzzlefs uses the FastCDC content defined chunking (CDC) scheme to attempt to de-duplicate layers, which sounds like an idea I’d be interested in based on previous adventures in this space and an academic paper I read about 18 months ago.
Josh Leeb has written a nice set of introductory blog posts about CDC schemes including FastCDC, which is across four posts: content defined chunking; Gear Hashing explained; FastCDC; and RapidCDC / QuickCDC. I definitely had to read the section of the second post covering the hash judgment function a couple of times for it to click, so I’d describe these posts as very helpful. One of the nice things about the hash judgment functions described in Josh’s posts is that you don’t need a corpus of common chunks to train on — the definition of a chunk is entirely probabilistic which is nice.
Now, there is a disclaimer to insert here — FastCDC, RapidCDC, and QuickCDC are not deduplication systems by themselves. What they are instead is a way of determining what chunks of a file you should consider as possible duplicates. That is, we still need to do some form of stronger (cryptographic?) hash on each of these chunks to determine if it is a duplicate. That actually works out quite well though, given that the OCI image registry expects to address content by cryptographic hashes anyway. RapidCDC / QuickCDC are performance improvements over FastCDC, but they gain those improvements by requiring some amount of pre-processing of the input files to provide “chunking hints”, which I think is probably cheating in most cases. That is, I think FastCDC is a pretty reasonable choice in the real world.
Some important points from Josh’s posts — I appear to have independently rediscovered possibly the worst implementation of Gear Hashing with the initial implementation of blockstash. Firstly I don’t solve the boundary-shift problem, although I note it exists. Secondly, I failed to realize that a cryptographically secure hash is actually not a great solution for the hashing of the chunks. Honestly, I’m ok with that as long as I learn something along the way. It seems like these algorithms have seen a bit of academic study since around 2016 when the FastCDC paper was published, so I am also only 9 years behind the state of the art!
You can learn more about puzzlefs if you’re interested in this video:
One interesting observation from the video which I didn’t see in Josh’s blog posts is that ensuring that the file system inside the image (be that a tarball, qcow2, or something else) is written in a deterministic manner will improve the performance of these chunking schemes quite a lot.
Overall, I don’t think that puzzlefs is particularly useful to me for various reasons, but reading these blog posts and watching the video has been a fun little tangent.
While most podcasts are available on multiple platforms and either offer an
RSS feed or have one that can be
discovered, some are only available in
the form of a YouTube channel. Thankfully, it's possible to both monitor
them for new episodes (i.e. new videos), and time-shift the audio for
later offline listening.
When it comes to downloading the audio, the most reliable tool I have found is
yt-dlp. Since the exact arguments needed
to download just the audio as an MP3 are a bit of a mouthful, I wrote a wrapper
script
which also does a few extra things:
cleans up the filename so that it can be stored on any filesystem
adds ID3 tags so that MP3 players can have the metadata they need to
display and group related podcast episodes together
Another year in the books! Here are a few of my 2024 highlights.
Hanami
It was a very big year for my open source work:
In February, we released Hanami 2.1, introducing our view layer and frontend assets support.
In April, we announced Luca’s retirement from open source, and I took on leadership of the Hanami project.
In November, we released Hanami 2.2, introducing our database layer as well as operations, and completing our refreshed vision for full stack Hanami apps!
In December, we announced Peter’s retirement from the core teams behind Hanami, dry-rb and ROM.
And that’s just scratching the surface. It was a very big year indeed. Big enough that I even wrote a special State of Hanami post to make it easy for you to catch up and learn about what’s coming next.
I’m immensely proud of what we managed to achieve with Hanami this year. After years of effort, we finally have the full expression of our vision out there and ready for people to use.
I said a thankful and heartfelt goodbye to two very special collaborators in Luca and Peter this year, but we still have a great contributor team, a group that was crucial in making the Hanami 2.2 release happen.
I’m as optimistic as ever for the future of our projects and I’m honoured to lead them into the future. I’ve worked on dry-rb and now Hanami for a decade of my life. I’m committed to making sure I can continue to do this for decades more, and am currently in the process of arranging my life to ensure these projects can remain a focus for me, and a sustainable one at that.
Let me close this section by saying this too—these are my year in review notes, after all—I’m also very proud of me for my persistence in getting to this point! I worked so hard, and never gave up! And above all of that, I’m very thankful for my family’s enduring support of this grand endeavour.
Conferences
This year I ran my first conference! Ruby in Common took place the day before RubyConf AU in Sydney. ~30 people (thank you to all who came!), unconf style, lots of good discussions. I’m glad I did it and I learnt a lot through the process.
Aside from this, I got to return to Singapore and reprise Livin’ La Vida Hanami for the triumphant return of RedDotRubyConf! I have a long history with this conference: my first major speaking opportunity, first repeat speaking opportunity, and I’ve attended 3 times overall. I’m so glad Ted and the crew brought it back, and I hope it has a good run into the future.
I finished the year with a trip to Chicago for RubyConf. This was my second time at RubyConf, and this event reaffirmed how valuable it is as a way to stay connected with the broader Ruby community. I ran a hack day table for Hanami, and Sean ran a Hanami workshop. Met so many new friends and old. I’m looking forward to going again once the new RubyConf schedule settles in.
Work at Buildkite
This was my third year at Buildkite, and I spent the entirety of the year as an engineering manager (which at Buildkite, tends to mean a hybrid people/technical/product leader). Firstly in our Pipelines group, and then from March in our Foundation team, the team tending to the core of our Rails monolith, plus a range of cross-cutting customer concerns.
I had a great time with this team. Everyone is kind, smart, and hard-working. We even got to welcome two more fantastic humans to the team over the year.
Most of the year we spent on one area: billing. This was in service of Buildkite relaunching itself—say hello to the Scale-Out Delivery Platform—which required all new pricing and packaging. We first made the way for paid hosted agents, then a range of package registries plans, and then finally a single, cohesive, usage-based plan for our whole platform. At the same time we also overhauled our billing usage system to receive all new kinds of events with minimal effort.
A year almost entirely on billing was more time than expected. This put us behind in a few other areas, but we’ve been catching up as we round out the year: Rails upgrades, database scaling work, and some preparation towards an innovative new API interaction model.
Family highlights
The big family highlight this year was our trip to Japan. First time to Japan for the kids, and the first time for both of us a number of places within. Our trip took us from Tokyo, to the Kurobe Alpine Route, to Toyama, Takayama, Nagoya, and then back to Tokyo again. We had a wonderful time.
Once again we made the most of long weekends and family celebrations to explore more of Australia. This time we made trips to Mittagong (and Fitzroy Falls), Merimbula (for my 40th!), Bathurst (for a hands-on-with-animals farmstay), Tumut (with a visit to Yarrangobilly caves and thermal pool), and Sydney (for a family wedding).
In May we experienced our first family live music performance: Jessica Mauboy at the Canberra Theatre! She was a fun, energetic performer and an engaging storyteller. Garmisch and I also made a little trip of our own to see Laufey in Sydney, which was sublime.
We also finally took the kids for their first-ever cinema experience, to see Inside Out 2.
A nice way to cap the year of family events was the kids performing in their school’s inaugural performing arts showcase, both in dance and music performances.
Books
A little less reading this year than last. See the “Hanami” section above: for months on end I was writing code until sleep’s sweet embrace. Still, I enjoyed the following, in order:
Iron Flame, Rebecca Yarros
On The Steel Breeze, Alastair Reynolds
Blue Remembered Earth, Alastair Reynolds
Poseidon’s Wake, Alastair Reynolds
Tidy First?, Kent Beck
Machine Vendetta, Alastair Reynolds
Floating Hotel, Grace Curtis
The Mars House, Natasha Pulley
Alien Clay, Adrian Tchaikovsky
Plutoshine, Lucy Kissick
Assorted things
A few other notables, to wrap up:
I did a whole bunch of Parkruns this year! I started with the kids, then as the weather got cooler, kept going on my own. I hit the 25-run milestone and managed to crack the 26-minute mark for a couple of my faster runs.
Unfortunately, my overall exercise regime (Apple Fitness+ HIIT and strength workouts, plus as much walking as I can fit in) took a downward turn in the second half of the year. Once again, see the “Hanami” section above: for too long I had to choose between progressing Hanami or time on myself. I’d like to find more balance in 2025.
I enjoyed getting back into camera-based photography, with the Japan trip as my ostensible motivation (but who doesn’t like the chance to nerd out on gear). I’ve settled on and am enjoying the Lumix G9ii (with the Panasonic Leica 12-60mm f2.8-4) and GX85 (usually with the Panasonic Leica 15mm f1.7), plus the diminutive GX850 and its 12-32mm kit lens, just for good measure.
I switched my daily note taking to Reflect, moving away from Logseq. Reflect is a simpler and more polished experience, and I found it much more approachable. I found myself pouring notes into it much more confidently.
I switched to Zed as my full-time editor. I’d been curious for a while, and after their open source switch in January, I gave it a go in earnest… and never stopped! It’s lightning fast and never gets in my way. I find the terminal integration much more natural. And after years of using Alabaster as my theme on VScode, I’m now a Catppuccin Latte lover.
After the Castro app had a few wobbles earlier this year, I had a minor dalliance with Overcast, but now that Castro is under new management and fully revitalised, I am very glad to be back.
38 films in the year! The big highlight was going through the entire Ghibli catalog with the kids before our Japan trip and visit to Ghibli Park. A few of those even I hadn’t seen; Whisper of the Heart is now one of my all-time faves. Other highlights from the year: Young Woman and the Sea (watched it on a plane and enjoyed it so much I got the whole family to watch it a couple days later), The Unbearable Weight of Massive Talent, and About Time.
I switched to Pika for my blog! This whole post was written in Pika’s editor! More on this in a future post :)
And in a couple of summer holiday software switcharoos, I’m now on Ghostty for my terminal and I’ve overcome my reluctance about Raycast (VC funded, AI blah blah blah) to finally give it a try in place of my beloved LaunchBar, which honestly feels like its stagnating.
Regardless, attempts to pigeonhole people into psychological buckets have always made me uncomfortable — be they Myers Briggs, Strengths Finder, or now the Four Tendencies. Ironically under the Four Tendencies framework I think that would make me a Questioner, but the (very short) analysis quiz declared me to be an Obliger. I am very sure my management chain at work would agree that if I am an Obliger I’m definitely hiding it pretty well. I’m not really sure what that means to be honest.
I think on the other hand, if I think about the Four Tendencies as being simply a description of the permutations of weighting between intrinsic and extrinsic motivation, that works better for me. I don’t know what that means for the quiz thing though. For example, I’ve always felt more intrinsically motivated than extrinsically — I’ll do things because I think they’re important, not because there’s a prize at the end.
(Please note, I still think performance bonuses are important!)
This book is very readable, although it took me way too long to finish because of other commitments in my life. I think even if the underlying theory is debunked, it is interesting to have a bit of a framework to think about how those around you respond to motivation. For example, at the work Christmas party I was talking to a friend who said he’ll only consistently exercise if missing a boot camp session would let the other people in the group down, not if he just wants to. That’s telling because its heavily aligned with this book, and also the complete opposite of my own lived experience with such things.
The Four Tendencies
Gretchen Rubin
Psychology
Two Roads
May 3, 2018
257
The end of this year marks my seventeenth year working in high performance computing and my ninth at the University of Melbourne in this role. When I compare this to previous years there have been some notable changes in the technology and the system I am primarily involved with (Spartan), but also in my own employment activities. Late last year, there was a structural review of our operations at Research Computing Services, as the existing organisational chart was becoming unwieldy and increasingly untenable. I ended up as the team leader for HPC Services and have stepped back somewhat from technical to management of a small but awesome team, along with organisational activities between other service groups (data, cloud) and our very close relationship with the infrastructure group.
Compared to last year, Spartan has increased to 7121 accounts and 2361 projects, mainly in engineering, bioinformatics (especially health) economics, mathematics, and more, and has been cited in at least 55 new papers. Machine learning has been a particularly popular area of interest for several years now on the system, which has especially benefited from Spartan's significant investment in GPUs, whose excellent vector computational performance is evident in the system receiving certification as a global supercomputer in November last year, jumping from a position of 453 (for the GPU partitions alone) in November 2023 to 262 in November 2024. Directly related to Spartan work, I attended two major conferences in person this year, "Supercomputing Asia" and "eResearch Australasia". For the former, I gave a presentation on the International HPC Certification Forum and a poster on usage outcomes from training. For the latter, I gave a presentation on the development of Spartan from a small but innovative system to its current supercomputer status.
Training various postgraduate and postdoctoral researchers on how to use the system has been part of my work for more than a decade now, and it took some acceptance on my part several years ago when I realised that I was the most prolific supercomputer educator in the country. This year, several hundred researchers attended the twenty-two workshops that I conducted on Linux knowledge, regular expressions, HPC job submission, high performance and parallel Python, parallel programming (MPI, OpenMP, CUDA), mathematical and statistical programming (R, Octave/MATLAB, etc.), and more. In addition, each year, I am brought in for lectures and assessments for the University's Cluster and Cloud Computing course, which also has several hundred students. In addition, this year, I took a leave from the University to travel to the Australian Institute for Marine Science in Townsville to run a week-long HPC training course for around fifty of the most switched-on (mostly) young researchers I have ever had the pleasure of meeting.
All of this has resulted in an extremely good review by my manager, who really appreciated the initiatives that I have taken within the new structure. These activities will continue, as I am increasingly emphasising the importance of organisational and technical quality assurance to RCS as a whole. A good portion of next year is already organised: I know I will be attending eResearch New Zealand to deliver a paper on HPC Training for Bioinformatics, eResearch Australasia in Brisbane, and I'll be doing lectures for UniMelb's COMP90024 course. In addition, I'll be doing my best to reduce the number of Spartan workshops I run, in preference to more online videos and documentation (we do a lot of the latter already, but it's never enough to satiate demand).
In many ways, I am deeply blessed to have the sort of job that I do. Even if I get a bit grumpy about bureaucracy at times, I love my work. I get to provide supercomputer support to researchers whose discoveries and inventions make real changes to the world we live in with a stunning return on investment of 7:1 over two years, nearly entirely in the form of positive social externalities. It is computing for medicine, for climatology, for materials, for agriculture, for the environment, rather than computing for social media or games (both of which I use, by the way). It is the sort of computing I was inspired by as a youngster, research-focused and close to the metal. I may add that it is good and secure employment, especially given that technical and knowledge skills are increasingly valuable where the ratio of capital to labour increases. In a nutshell, I am more than happy with how supercomputing is progressing, and I am very happy with this career choice.
Every new Ruby comes with its headline features, but I always appreciate the small improvements that go alongside. Here are my favourites from Ruby 3.4.
String literals in files without a frozen_string_literal comment now emit a deprecation warning when they are mutated.
Having to place # frozen_string_literal: true at the top of every file never felt truly like Ruby, and I was quite sad when (6 years ago) Matz declared that we wouldn’t move to freezing them by default.
Then came byroot, the hero of the hour, with a fresh plan and renewed energy, and here we are, the first step having arrived! We shouldn’t delete our frozen_string_literal pragmas just yet (see this great explainer from by byroot and fxn), but we’re now on the way to saying bye bye to them by Ruby 4.
Keyword splatting nil when calling methods is now supported. **nil is treated similarly to **{}, passing no keywords, and not calling any conversion methods. [Bug #20064]
This feels much more natural. I’ve had to do various dances before splatting hashes before, and none have felt great.
Passing a block to a method which doesn’t use the passed block will show a warning on verbose mode (-w). [Feature #15554]
That every Ruby method accepts a block has surprised folks at work on a number of occasions, even recently. I don’t think we’re in the position to run our whole app with warnings enabled, but this will surely help raise awareness of this in general.
it is added to reference a block parameter with no variable name. [Feature #18980]
I almost didn’t include this! I was honestly pretty fine with _1, but I like to keep up with Ruby features, so I’ll switch to it for any appropriate single-variable blocks, and I think that’ll feel nice.
All in all, this feels like a fairly modest release for language-level features. Huge advances in other areas though: new parser, modular GC, and a whole buncha YJIT improvements. Ruby keeps moving forward. Thank you to the Ruby team and all Ruby contributors! ��
Now that we’ve defeated QNAP’s slightly broken udev, we can run a Docker container with rtl_433 in it to wire up our Vevor 7in1 weather station to Home Assistant via MQTT. First off, we need a Docker container running rtl_433, which assumes you’ve already setup the udev rule mentioned in the previous post, even if you’re not using a QNAP!
I like to write little shell scripts to run Docker containers. In this case this one:
What this script does is remove any previous version of the container that might be running. It then uses our reliable symlink from the previous post to lookup the real device file. That real device file is then passed through to the Docker container. I am not entirely sure of the subtleties here, but rtl_433 refused to use the device if I passed it through as the symlink, and Docker doesn’t appear to be able to remap device files like it does for ports or mounts. Regardless, this worked at least.
The finally, we have the correct command line for rtl_433 for this weather station. Note that “mqttserver” is probably not the name of your MQTT server. This command line names the MQTT topic for the model and ID of the weather station, so if you had more than one of these you’d get separate topics for them. My weather station appears at “rtl_433/Vevor-7in1/48399” for example.
In terms of completeness, the logs from rtl_433 look like this (reformatted to not be ugly, its all a single line in the actual user interface):
Finally, we just need to write it up to Home Assistant. I am going to assume you already have MQTT configured, and wont talk about that bit. However, I have these in my configuration.yaml file:
So, this was a lot harder than it really should have been, especially because rtl_433 is a bit thingie about where the device appears in the /dev/ file system as an added sting in the tail…
In my specific scenario, I was given a Vevor 7-in-1 wireless weather station for Christmas. They seem fairly solid and full featured for a $130 AUD device, so no complaints there. The device is also natively supported by rtl_433 which is a RTL SDR package, although its not supported in the version shipped by Debian 12. That’s awesome, although it would have been nice if the command line to use was documented better. I’ll talk more about those bits in a later post though. In this one I want to focus on the fun I had getting a USB device reliably passed through to a Docker container on my QNAP NAS.
First off, I am using a Realtek USB TV tuner for rtl_433, which appears like this one the NAS:
$ lsusb
Bus 001 Device 002: ID 0bda:2838 Realtek Semiconductor Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Specifically that first device is the lucky winner here. First off, I want that to appear reliably in /dev/ so that I can find it to pass it through to Docker. By reliably, I mean that its bus and device ID might change if you plugged it into a different USB port, or added a USB hub to the NAS. The way I’d usually do that is with a udev rule, which didn’t work at all how I expected on the QNAP Linux distribution.
This is because of two things — the udev version on the QNAP is a bit broken, and the configuration of the NAS in /etc/ and /lib/udev is clobbered on boot from a known good version. So first off we need to tweak that known good version to provide the configuration we want. This is in fact documented on the QNAP site, but took a while to find. Specifically in my case, I needed to mount the configuration in /tmp/config…
$ sudo bash
# mount $(/sbin/hal_app --get_boot_pd port_id=0)6 /tmp/config
And then I could create a file called /tmp/config/autorun.sh which contained this:
#!/bin/bash
cat <<EOF > /lib/udev/rules.d/60-rtlsdr.rules
ATTR{product}=="RTL2838UHIDIR",ATTR{serial}=="00000001",SYMLINK+="rtl433"
EOF
udevadm control --reload-rules
# This is horrid, but the udev in QNAP land is a bit broken...
usb_device=$(lsusb | grep Realtek | sed -e 's|^Bus |/dev/bus/usb/|' -e 's| Device |/|' -e 's|: .*$||')
pci_device=$(udevadm info -q path -n ${usb_device})
udevadm test --action="change" ${pci_device}
What this does is create a udev rule which matches the product and serial number of my RTL TV tuner, creating a symlink at /dev/rtl433 to the correct device file. You then need to have udev reload its rules, and then trick udev into running the new rule by running a test action against the correct PCI device (which you find by looking up the correct USB device). Lovely.
We then need to make that script executable, and unmount the config file system so that our changes are flushed to disk:
How various parts of the US constitution thwarts the will of an expanding multicultural majority in favor of a shrinking rural white minority. Interesting 3/5
Explains the demographic transition and how it has flowed from the UK to Europe to the rest of the world and how this has and will influence history. 3/5
$895 to Givewell Top Charities fund . I’ve been donating to Givewell as my main “help the poor” charity since they have fairly low overheads and try and get the most impact from their donations. They also get good reviews for living up to these goals.
My employer matched this donation so total given to Givewell was $1790.
Software and Internet Infrastructure Projects
Software in the Public Interest, The Software Freedom Conservancy and LibreOffice all use Paypal which is blocking charity donations from Asia/Pacific so I was unable to donate to them.
€20 to Syncthing which I use to Sync files between my computers and phone since I stopped using Dropbox.
$NZ 200 to The Spinoff which is a New Zealand news website. I usually donate $100/year but I did a extra $100 one-off since they are having funding problems
I just bought a Hisense 65U80G 65″ Inch 8K ULED Android TV (2021 model) for $1,568 including delivery. I got that deal by googling refurbished 8K TVs and finding the cheapest one I could buy. Amazon and eBay didn’t have any good prices on second hand 8K TVs and new ones start at $3,000 on special. I didn’t assess how Hisense compares to other TVs, as far as I could determine there was only one model of 8K TV on sale in Australia in the price range I was prepared to pay. So I won’t review how this TV compares to other models but how refurbished TVs compare to other display options.
I bought this because the highest resolution monitor in my price range is 5120*2160 [1]. While I could get a 5128*2880 monitor for around $1,500 paying 3* the money for 33% more pixels is bad value for money. Getting 4* the pixels for under 3* the price is good value even when it’s a TV with the lower display quality that involves.
I don’t plan to make it a main monitor. While 5120*2160 isn’t as good as I like on my desk it’s bearable and the quality of the display is high. High resolution isn’t needed for all tasks, for example I’m writing this blog post on my laptop while watching a movie on the 8K TV.
One thing I’d like to do with the 8K TV when I get it working as a monitor is to share the screen for team programming projects. I don’t have any specific plans other than team coding projects at the moment. But it will be interesting to experiment with it when I get it working.
Technical Issues with High Resolution Monitors
Hardware Needed
A lot of the graphic hardware out there don’t support resolutions higher than 5120*2880. It seems that most laptops don’t support resolutions higher than that and higher resolutions than 4K are difficult. Only quite recent and high end video cards will do 8K. Apparently the RTX 2080 is one of the oldest ones that does and that’s $400 on ebay. Strangely the GPU chipset spec pages don’t list the maximum resolution and there’s the additional complication that the other chips might not support the resolutions that the GPU itself can support.
As an aside I don’t use NVidia cards for regular workstations due to reliability problems. But they are good for ML work and for special purpose systems.
Interface Versions
To do 8K video it seems that you need HDMI 2.1 (or maybe 2.0 with 4:2:0 chroma subsampling) or DisplayPort 1.3 for 30Hz with 24bit color and 2.0 for higher refresh rates. But using a particular version of the interface doesn’t require supporting all the resolutions that it might support. This TV has HDMI 2.1 inputs, I’ve bought an adaptor cable that does DisplayPort 1.4 to HDMI 2.1 at 8K resolution. So I need a video card that does DisplayPort 1.4 or HDMI 2.1 output. That doesn’t mean that the card will work, but it could work.
It’s a pity that no-one has made a USB-C video controller that has a basic frame-buffer supporting 8K and the minimal GPU capabilities. The consensus of opinion is that no games will run well at 8K at this time so anyone using 8K resolution doesn’t need GPU power unless it’s for ML stuff.
I’m thinking of making a system that can be used as a ML server and X/Wayland server so a GPU with a decent amount of RAM and compute power would be good. I’m not particularly interested in spending $1,500+ to get a GPU that can drive a $1,568 TV. I’m looking into getting a RTX A2000 with 12G of RAM which should be adequate for ML experiments and can handle 8K@60Hz output.
I’ve ordered a DisplayPort to HDMI converter cable so if I get a DisplayPort card it will work.
Software Support
When I first got started with 4K monitors I had significant problems in adjusting the UI to be usable. The support for scaling software is much better now than it was then and 8K 65″ has a lower DPI than 4K 32″. So I hope this won’t be an issue.
Progress So Far
My first Hisense 8K TV stopped working properly. It would change to a mostly white screen after being used for some time. The screen would change in ways that correlate to changes in what should appear, but not in a way that was usable. It was just a different pattern of white blobs when I changed to a menu view not anything that allowed using it. I presume that this was the problem that drove a need for refurbishment as when I first got the TV it was still signed in to Google accounts for YouTube and to NetFlix.
Best Buy Electrical was good about providing a quick replacement, they took away the old TV and delivered a new one on the same visit and it’s now working well.
I’ve obtained a NVidia card that can allegedly do 8K output and a combination of cables that might be able to carry an 8K signal. Now I just need to get the NVidia drivers to not cause a kernel panic to get things to work.
I recently got a OnePlus 6 for the purpose of running Debian, here’s the Debian wiki page about it [1]. It runs Debian nicely and the basic functions all work, but the problem I’m having now is that AldiMobile (Telstra) and KoganMobile (Vodafone) don’t enable VoLTE for that and all the Australian telcos have turned off 3G. The OnePlus 6 does VoLTE with Chinese SIMs so the phone itself can do it.
The OnePlus 6 was never sold in Australia by the telcos, so they are all gray-market imports which aren’t designed by OnePlus to work in Australia. Until recently that wasn’t a problem, but now that the 3G network has been turned off we need VoLTE and OnePlus didn’t include that in the OS. Reddit has documentation on how to fix this but it has to be done on Android [2]. So I had to go back from Mobian to Android to get VoLTE (and VoWifi) working and then install Mobian again.
For people with similar issues Telstra has a page for checking which phones are supported [3], it’s the only way to determine if it’s the phone or the network that makes VoLTE not work – Android isn’t informative about such things. Telstra lists the OP6 as a suitable phone.
Now after doing this I still can’t get the OP6 working for phone calls on Phosh or PlasmaMobile and I’m not sure why. I’m going to give the PinePhone Pro another go and see if it now works better. In the past I had problems with the PinePhonePro battery discharging too fast, charging too slowly, and having poor call quality [4]. The battery discharge issue should be at least alleviated by some of the changes in the Plasma 6 code that’s now in Debian/Unstable.
I’ve also been lent a PinePhone (non-pro) and been told that it will have better battery life in many situations. I’ll do some tests of that. The PinePhonePro isn’t capable of doing the convergence things I was hoping to do so the greater RAM and CPU power that it has aren’t as relevant as they otherwise would be.
I have a vision for how phones should work. I am not discouraged by the Librem 5, PinePhonePro, Note 9, and OnePlus 6 failing in various ways to do what I hoped for. I will eventually find a phone that I can get working well enough.
More work in Overleaf – an online editor for LaTeX
I’m still finishing my PhD at the Australian National University’s School of Cybernetics, and my primary tasks are writing up my thesis – which I’m doing in Overleaf in LaTeX. ANU now has a subscription to Overleaf, which makes logging on with single sign on (SSO) easier. Overleaf grew from two open source projects called ShareLaTeX and writeLaTeX, which both aimed to provide a collaborative environment for co-authoring papers in LaTeX. In 2017 they merged to become Overleaf.
I’ve been impressed with how the interface to Overleaf has been incrementally developed over the last year – I can backup my thesis to GitHub at the click of a button, and the ability to tag projects for easy retrieval has been great. A key feature I’m still missing in Overleaf is the ability to organise projects (papers) into folders, and the ability to share whole folders with collaborators. For example, I want to be able to put all my thesis writing in a folder and share that with my doctoral committee.
However, although Overleaf is billed as a social enterprise, it is wholly owned by Digital Science, an academic publishing services company, which in turn is a subsidiary of Holtzbrinck, a German publishing company – which also owns 53% of Springer Nature. I’ve noticed that there are several “experiments” being run (I’m in the Beta / Labs program), all of which encourage the user to “upgrade” to premium versions of plugins. This week Overleaf was pushing Writefull, a Grammarly-esque tool to help edit academic work – which it acquired in 2022. It was prominently displayed in the interface, and works well – except that you only get about 10 corrections before being prompted to upgrade to the Premium version, which is $USD 7.33 per month, with a current special being offered, bringing it down to $USD 4.40 per month. I figured out how to disable it, but it wasn’t intuitive, which I am sure is by design.
The play here, I speculate, is to get grad students hooked on Writefull, who in turn will pressure their institutions to buy the upgrade at an institutional level. I mean, kudos, it’s a smart GTM strategy – but it’s yet another instance of universities being beholden to publishing or software companies for increases in productivity and research outputs. Imagine instead if a group like AARNet or Universities Australia had invested self-hosting Overleaf (which is possible) – but then, that still costs money, and OpEx at that, which is always harder to justify than CapEx because of the prevailing view, in my experience, that software should look after itself.
The second-order piece here is that Overleaf / Digital Science / Holtzbrinck is collecting user experience data on how researchers use the Overleaf platform – rather than that being something that universities themselves can use to gain insights that feed into research development activities. From what I can tell just from the requests that Overleaf is sending, it’s using Google Analytics, Stripe and Growthbook (an open source feature flag and A/B testing suite, which collects UX research data) and BugSnag, which to be fair, is not that many and feels very boot-strappy (e.g. there’s no Adobe type CX cruft). Another GTM play I see here for Overleaf is then selling these “insights” back to universities (or not – and using them to create more premium plugins to upsell).
Even with these downsides, Overleaf is a game-changer for collaborative LaTeX document editing, so I suspect I’ll be using it for the foreseeable future.
Dark Visitors – protecting my blogs from web scrapers
Dark Visitors is a tool that fits into the latter category – and I discovered it when it was featured on The Sizzle newsletter, then featured in a Mozilla Data Futures Lab talk. Its key features are identifying, then blocking (if you have the premium version) scrapers and harvesters.
It’s been a revelation looking at the visuals from Dark Visitors to realise just how much non-human agent traffic there is to my blog. For example, here’s an overview of November – and you can see that one scraping agent – in this case a Scrapy instance – hit my blog over 1200 times in one day. Keep in mind this is a tiny, cpanel-hosted blog – it’s not CDN’d, and it’s not intended to serve high volumes of traffic. Scrapers and bots are now capable of basically DDOSing entire sites when they try to harvest data.
The other key insights Dark Visitors has given me is just which agents are disrespecting robots.txt, and therefore deserved to be banned. Dark Visitors has a WordPress plugin which makes this easier – essentially if a user agent makes a request, and it’s blocked, then it gets served a 403 Forbidden http response (instead of 200 OK and the content). For example, AppleBot is a key culprit:
I do know of folks who are less forgiving and instead use nginx to serve a 10Tb binary
The key hardware change this year has been using adopting an Onyx Boox Tab Pro X Android e-paper tablet. I wanted a tablet because I’m still reading a lot of papers, which quickly gets tiring on an LCD screen, and I’ve also had to elevate my dodgy leg this year a lot more because $REASONS, and a tablet is much more comfortable .
I settled on the Onyx Boox Tab Pro for several reasons:
It uses stock Android (AOSP), which means that you can use the Google Play Store to install software
It is black and white – I wanted this, so that colour was less distracting
The A4 size was a key feature for me for reading primarily A4 or Letter sized documents
The handwriting recognition notes feature was also attractive
Bluetooth means that you can connect a keyboard and mouse and use it to type documents
So far, I’ve been super happy with the Boox Tab Pro X – it’s fantastic for note-taking meetings, and for marking up PDFs. The key downside I see is that the markup of PDFs is not converted to text from handwriting – so the saved image size is a lot larger. I have been taking this to meetings where previously I would take my laptop – and I find that handwriting on the tablet has a different cognitive “feel” – my brain works in a different way when using a table – a phenomenon that is also backed up by recent research.
Still trying to find an alternative to the paper-based Passion Planner
This has been a “just OK” choice. The paper is thinner, the hardback binding makes it less flexible, and the layout is not as good as the Passion Planner. But I still refuse to pay the exorbitant postage Passion Planner wants.
So, for 2025, I bought the digital edition, which is a set of PDFs. They are promoted as working well with most PDF readers, but tested best with GoodNotes. I installed GoodNotes on the Onyx Boox, intending that the tablet would become my new planner – and it wouldn’t even load! So, fail on that front. I am still trying to get the Passion Planner PDF to render well in NeoReader – the native PDF reader for Onyx Boox, and you’ll have to stay tuned for how I go.
Making friends with Claude, but still enemies with generative AI more broadly
While I am still very sceptical of generative AI, and aghast at the amount of natural resources such as water and electricity it consumes – to the point where people want to re-commission nuclear power stations – I am also aware of their productivity benefit.
I’ve settled on Claude from Anthropic – mostly because I think the values on which Anthropic are founded are vastly different to OpenAI, if still problematic (the whole existential risk, effective altruism, open philanthropy nexus is problematic, although I find the TESCREAL naming unhelpful in trying to tackle it).
I have several detailed prompts specific – such as the philosophy professor who helps me to refine methodological arguments, and the the positive psychology coach who’s helping me maintain motivation and discipline so I can finish this ^*&&!! PhD.
What’s currently in my toolchain?
Hardware, wearables and accessories
My main laptop is an ASUS ROG Zephyrus G15 (no change since last report)
Google Pixel 4a 5G running stock Android (no change since last report)
Mobvoi Ticwatch Pro 2020 model (no change since last report)
I’m finding some of Claude’s capabilities useful from a mentoring perspective, and I think I would genuinely pay for a daily coach and mentoring service – something more sophisticated that a task or action tracker, or emotion tracker, but something which has a long-term context and is able to help guide and motivate me toward my long-term objectives. I wonder if this is where services like ChatGPT or Claude are heading – essentially replacing middle management and team leaders.
Impact tracker
How do I know that the actions I am taking are having an impact? RescueTime tracks where I spend my time, and helps me see if I’ve gone down a Sapiens rabbit hole or binged a bit too much Expanse or For All Mankind (but they’re soooo good!). But how do I link my work and tasks to broader objectives and progress toward those objectives? Super Productivity has some of this capability, but is dependent on your decomposing broader goals into much finer-grained ones. And how might I link the work I’ve done to increase my physical activity and capability with longer-term fitness goals? There’s a space here amongst the Get Shit Done, One Minute Manager ecosystem for something that is better aligned to tracking impact.
No, I’m not accepting paid sponsorships or links
My State of the Toolchain posts get a bit of traction, and with that, I’ve started attracting people wanting me to link to their product or website, for a fee. That is, sponsored content – but of course they don’t want it to look like sponsored content. I usually respond harshly, dismissively and swearily to these requests.
If you want me to write about your product or service, build something awesome.
Disclaimer: despite what people seem to assume when you buy a mechanical keyboard, I am not a keyboard fetishist. I’ve been using Microsoft Sculpts for over a decade because of historical repetition injury issues, and with Microsoft discontinuing the Sculpt and the new manufacturer taking their time taking over, all I want is a keyboard which is like a Sculpt, except where I haven’t worn out the space bar. I will then go back to thinking approximately never about keyboards.
So, the keyboard I could find which was closest to the Sculpt after a lot of Googling was a Keychron K15 Max, which is a QMK keyboard. For those which don’t know, which included me until yesterday, QMK is the open source firmware that many of these mechanical keyboards run and yes you really can customize the firmware on your keyboard now. The K15 is a 75% “Alice” layout, which means split with no numpad. It does have media, function, and macro keys which is nice. I’ve never had a keyboard with macro keys before. It also has low profile switches, which is nice because the Sculpt basically has laptop style switches, and I chose the Gateron Brown low profile switches because I didn’t think I wanted the tactile “clicky” thing. Then again, I have now tried exactly one mechanical keyboard, so its entirely possible the K15 isn’t “perfect” for me — its simply good enough to make me stop thinking about such things.
So this leads to the Michael Problem Of The Day — I listen to music in my office using a Sonos system. While there is a Sonos app for the mac, it doesn’t appear that MacOS recognizes it as a music playback app like it does Spotify, and it gets super confused about the volume controls not controlling the volume on the mac itself, but this other device on my network.
Now — I could solve these two problems if I could cause the media keys on the K15 to run command line programs, as I already have a command line app which does things like skip tracks, volume control, etc. But like I said, the K15 only emits keypresses and the MacOS built in keyboard shortcut thing doesn’t appear to support running commands.
In the end, I spend an hour or so and built a Rube Goldberg machine to solve my problems:
Keychron K15 75% Alice keyboard with media keys remapped to otherwise unused macro keys.
For those macro keys, each is assigned a particularly unlikely unique keypress combination — for example ctrl-splat-alt-Q is pause / play. One of the key learnings for me having never used a QMK keyboard before is that you can only send keypress events to the machine — I was expecting mac side software and to be able to more directly trigger things, but that doesn’t seem to be a thing.
KeyboardMaestro on the mac with those macro keypress combinations mapped to hotkeys that run shell scripts.
The shell scripts use https://pypi.org/project/soco-cli/ to send commands to the sonos over the network. This is surprisingly reliable once I built it all out and makes me happy.
Now, one sad bit is I have two desks in my office — a work one and a personal one. There is strict separation — the work machine is not on my home network, it is on its own VLAN which only sees the ISP NTD. This is partially because of recent high profile compromises where an employee’s personal use of a computer was used to breach their employer (LastPass and Medibank I’m looking at you), but also because there have been allegations that my employer’s cyber team will scan the networks their endpoints are on and I have zero enthusiasm for a corporate cyber team pentesting my home network. All that said, this means that such an arrangement wouldn’t work for a keyboard on my work desk.
Well, except… KeyBoardMaestro can call webhooks. So I could so something with home assistant providing a webhook to call soco-cli, but haven’t really thought about it in detail yet.
Meeting opened at 20:05 AEDT by Joel and quorum was achieved.
Minutes taken by Neill.
2. Log of correspondence
Oracle Linux & offerings | Increase Security and Reduce Cost – Neill has sent the EO2025 sponsorship prospectus
Re: US Tax Paperwork Questions – Query from Joe Chin re sponsorship of DrupalCon Singapore
Request for approval of >$5k invoice – request for approval of payment for the PyConAu sprints venue
Steering Council Update / PyconAU 2025 Update – update from Clinton about PyConAu 2025 and the PyCon Steering Committee – Joel has responded
WordCamp Brisbane 2025 subcommittee – discussions about WC Brisbane
PythonWA funding opportunities – email from Ben Fitzhardinge about possible support o PythonWA
Fwd: Welcome to Write the Docs Australia 2024 – update about the Write The Docs conference that was run in Melbourne on Nov 28 and 29.
Approval for final venue balance invoice – the final payment for the PyconAu main conference venue has been approved.
Introducing DrupalSouth Committee Chair – Julia Topliss – Russell has done the onboarding for Julia and Margery Tongway (Drupal SC secretary)
Re: Please set me up with DrupalSouth stuff please – more onboarding discussion between Russell, Julia and Margery
Should Linux Australia have a BlueSky account? – after some discussion Kathy Reid has setup a bluesky account for Linux Australia
[LACTTE] Grant Application from Kylie Willison for Open Laptop Project – grant application from Makerspace Adelaide
Grant Application – budget for the Makerspace Adelaide grant application
Request for approval on >$5k invoice – payment of the Next Day Video invoice for PyconAu has been approved
Amazon Web Services Billing Statement Available [Account: 103334912252] – invoice for $7.44
Grant Report from Write the Docs Australia – report from Felicity Brand on the conference. Jonathan has responded.
Driving Open Source Forward: Make Your Impact in 2025 – Newsletter from OSI
3. Items for discussion
Drupal South Subcommittee
Dallas is an apology
Canberra Community Day
Went well. Had a great community vibe. Even the bad feedback was good.
Overall a fun day – sold the last 30 tickets on the last two days.
Had a bigger presence at the GovCMS event this year.
GovCMS ran a BoF centered around contributing back to Drupal. Gained a new contributor from that.
70% of attendees indicated they would attend again.
Keeping attendance free for APS attendees was a good decision.
Ready to start planning for next year. Will be hard to make it even better next year.
.
Melbourne 2025
Currently looking at a loss of $59k. Early days for ticket sales. Expect to turn a profit if currently expected sponsors sign up and ticket sales match expectations.
In a fairly good position thanks to the work of David and Michael.
Ticket prices have been raised, but there has not been any negative feedback.
Currently a little focused on Singapore. When that wraps up the organisers will be able to focus on planning for Melbourne.
Want to get new people into the Drupal ecosystem. Will be reaching out to educational institutions with discounted tickets if sponsorships allow.
Some minor issues with transactions. Russell is assisting with sorting out the spreadsheet. Confusion seems to be caused by changes to accounts.
Russell has rarely seen such a clear financial statement from a conference.
DrupalCon Singapore Subcommittee
Starts on Monday.
It has been quite a ride.
Huge amount of community buzz.
212 people have bought tickets. 49.8% are at their first Drupal event. This is much higher than other conferences.
Sold out of exhibition space. Hit the target of 100 people weeks ago.
Have tried very hard to bring in new sponsors. Created new sponsorship opportunities. Particular outreach to Japan and India which has brought in another $25k. Still looking at a $18k loss.
One final maybe, the contract has not yet been signed but an invoice has been raised.
Without the LA “tax” it’s only a 3k loss.
Apart from the budget everything else is looking really good.
Russell has asked for confirmation of the final large invoice. Mike will check and confirm that it is correct.
There is a spot in the calendar for DrupalCon Japan – will LA back that given this year’s experience? LA cannot guarantee anything at this point, but is willing to consider it.
Kiwi PyCon / Python NZ
Now looking forward to 2025. Chelsea will likely be the chair for 2025.
There was one missing transaction – a refund from Germany in Euros. Has now arrived.
Venue – waiting on confirmation for mid November dates
Preparing first meeting to plan budget
2024 – the budget was on point except for sponsorship which was overestimated by $30,000. The goal is to make sure the budget will work even with the same reduced sponsorship.
Saving money by changing AV providers. Setting the lowest expectation to what was received this year.
Want to keep the option of striking out lunch catering. Will still cater morning and afternoon tea.
May need to raise ticket prices
Approach sponsors in January.
Open tickets sales in April – this will allow a decision about changes to catering and ticket prices once there is an indication of what sponsorship will be like.
The changes to AV will save $10k, Catering $10k, and raising ticket sales will hopefully bring in the last $10k.
LA is happy overall with 2024. It was obviously a challenging year but LA is positive about running the conference again in 2025.
Joomla
Have decided to put the next event off until the second half of 2025.
LA has not made any progress on the subcommittee yet. Joomla have provided the signatories to LA.
PyCon AU
Peter is an apology due to a cold
The event was a success.
Tracking to return a $6,000 profit.
Some issues with leaks in the sprint venue.
Appreciated LA’s flexibility on the budget, especially for catering.
Starting to look at 2025.
Peter will chair 2025. Neillwill take over as treasurer.
Discussions beginning about 2026.
Also looking at how to maintain state between conferences.
Flounder
Have continued to have monthly meetings
Attendance is relatively low. Would like ideas about how to attract more participants.
Attendees have a good time and learn useful things
Not just Australian attendees.
LUV
Nominal membership of thousands.
Actual attendance is much lower.
Annual BBQ is coming up which will be used to try and motivate people to attend.
Elections are coming up.
Suffers a bit from the invisibility of Linux nowadays – gone from “cool tech” to “background radiation”
Major expense is meetup. Maybe it’s time to consider dropping that.
WordPress
WC Sydney was good. Attendees enjoyed it, particularly the networking.
248 attendees. 181 actual ticket sales. Higher than expected sponsorship. 130% higher than expected.
Drama around WP Engine and refunding their sponsorship.
Currently around $3,000 profit after the LA “tax”
The spreadsheet says there should be about $5,000 so that needs to be reconciled.
Definitely made a profit though.
Speaker videos are published.
Mostly 4 or 5 start reviews. A single one star review
Managed to sign on about $68,000 of sponsorship which is $7,000 more than expected.
Ticket sales are a bit lower than expected.
The organising team don’t have as many contacts in Adelaide as expected.
Sa Rae is reaching out to an old LCA contact.
Still trying to contact a missing speaker.
Purplecon
Conference is complete
Financial reconciliation is complete
There will probably be a purplecon next year if enough organisers can be found.
Made a loss of around $864? Russell will confirm that the transactions have been entered correctly because he sees a profit.
Admin Team
Maliming list migration is underway (again). Want to get that done before elections are called.
What’s happening with the election? Sae Ra is finalising dates and will let Steve know either tonight or tomorrow.
EO2025 should be an opportunity to meet the new admin team candidate.
Still chasing the registrar about opensource.au which is still marked as contested.
GoDaddy has a service to broker purchase of domains – costs $130. Unfortunately drop.com.au are a domain squatter and will probably ask for an excessive amount for opensource.com.au
Admin team reimbursements have been received.
4. Items for noting
Grant application from Makerspace Adelaide
A decision will need to be made at the next meeting.
It is a significant amount of money.
Write The Docs
Well run
Good catering
There was live illustration, but not shown at the conference.
Attendees had a good time.
Well with sponsoring.
Repeated acknowledgement of Linux Australia as a sponsor.
5. Other business
Joomla Subcommittee
A budget has been provided. It’s fairly basic, but that’s because they don’t have much in the way of income or expenses.
Motion: That Linux Australia accepts Joola Australia as a Steering Subcommittee and accepts their budget for 2025.
Moved: Joel
Seconded: Neill
Result: Passed unanimously
PythonWA
While at PyConAu Neill met Ben Fitzhardinge and spoke to him about possible ways Linux Australia might be able to support the PythonWA user group. I thought I had suggested infrastructure – banking, insurance, hosting – rather than direct financial support but I may have mentioned the grants program.
Draft minutes for the following meetings are online in draft format for review. Neill will publish them in the next few days:
2024-07-03
2024-07-17
2024-07-31
2024-08-14
2024-08-28
2024-09-11
2024-09-25
2024-10-09
2024-10-23
2024-11-06
2024-11-20
6. In camera
No items were discussed in camera
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 21:46 AEDT
Next meeting is scheduled for 2024-12-18 at 08:00 AEDT (UT+1100) on jitsiobjects
Meeting opened at 20:03 AEDT by Joel and quorum was achieved.
Minutes taken by Neill Cox
2. Log of correspondence
Get ready for Write the Docs Australia 2024 – update on conference progress, Andrew will attend.
Missing info for NZ GST – Russell following up in preparation for LA’s audit
Oracle Linux & offerings | Increase Security and Reduce Cost – Offer from an Oracle Linux salesperson to speak to members. Neill will send them the EO2025 sponsorship prospectus.
The Open Source AI Definition v.1.0 is here, what now? – OSI Press release
Re: WCSyd Payment sent to wrong bank account. – Russell has responded
Activity Statement for Kiwi PyCon XIII (K000086) ? – Russell has responded
Lodge Linux Australia Activity Statement July..September 2024 – has been lodged
LinkedIn comment about LA website being out of date – Kathy Reid notified us of a linkedIn comment about LA. Sae Ra has contacted Kathy.
Linux Australia’s 2023/2024 audit – Russell has submitted financial statements to the auditor
Steering Council Update / PyconAU 2025 Update – update from Clinton about PyConAu 2025 and some changes to the PyConAU steering committee
Time sensitive: Final venue payment for PyCon AU – request for approval of a payment for PyConAU. Payment has been approved.
Russell is on holidays
Re: DrupalCon Asia Payment Approval Request – Russell has processed the request
Approval for >$5k invoice – Request to approve payment for a PyConAU invoice. Russell has approved the payment.
Introducing DrupalSouth Committee Chair – Julia Topliss – Email from Dave Sparks introducing the new DrupalSouth committee members.
3. Items for discussion
Constitutional updates
We are working on identifying the needed changes. It looks like the easiest approach will be to adopt a new constitution rather than amending our current one to match.
Significant changes:
We need to clearly specify our financial year dates
We should make sure we make the changes necessary for us to apply for charity status.
We will also have to pay careful attention to the requirements about record keeping for membership lists. We will likely have to keep a list of names and addresses for all members to meet the legal requirements of being an incorporated association. Ideally we will still be able to allow members to opt out of having their addresses changed, but this will require legal advice.
Community update email
Joel will send an email to the community to:
Update the community on upcoming events
Inform the members that we need to update the constitution due to external requirements.
A link to a working area where discussions will be held.
The actual changes to the constitution will likely need to be done at a special general meeting. Doing it at the AGM will be impossible due to the notice that needs to be given.
LA Election
We will aim to hold the AGM at Everything Open. We will work backwards from that to determine when the election needs to be held.
4. Items for noting
Grant applications will be closed at the end of November.
Schedule for Everything Open is up. It’s a corker!
Meeting opened at 19:34 AEST by Joel and quorum was achieved.
Minutes taken by Neill.
2. Log of correspondence
Re: DrupalCon Singapore Bill Payments – Wise verification done and payments sent
Get ready for Write the Docs Australia 2024 – Update on Progress from Felicity Brand
Missing info for NZ GST – questions about KiWi PyConand NZ GST – Russell has responded
admin team reimbursements – Russell has responded
Fwd: Catering at PyCon AU 2024 – see Other Business below
Manufacturing Statement – 71183992 from RedBubble
Fwd: An Osko Payment has been rejected – Russell has responded
Re: Misc Issues with NZ Accounts
Amazon Web Services Billing Statement Available
Exec minutes for the previous financial year – request from Russell for minutes to be published. This has now been done.
Activity statements are available online – ATO notification
WordCamp Sydney Incident Report
Incorrect GST on transactions
Lodge Linux Australia Activity Statement July..September 2024
Re: Following up on Linux Australia sponsorship
Important: Upcoming changes on your existing product from 6th December 2024.
Singapore Budget Update
Unpaid sales invoices
Joomla sub-committee – draft budget supplied
Unpaid invoice dated 1/Apr/2024
3. Items for discussion
Drupal South Subcommittee
Canberra Community Day
Next week. Speaker line up looks good. Content looks good. Just over 100 attendees booked in. May vary slightly on the day. Budget is looking good. Tracking for a profit of around $4,000.
Shows the strength of running events in Canberra with the support of the government community.
Elections
Elections for the committee concluded this week. One new Australiana and one new NZ member. Two members leaving – David and Michael did not seek re-election. This will be Dave’s last meeting with the LA Council.
There were a good number of nominations. The charter requires two Australian and to New Zealander members.
The community seems a little smaller, but the number of votes was similar.
Official announcement to be made soon, including declaration of who will be the Chair.
Melbourne 2025
Looking forward to using the A/V support from the venue.
Some of the traditional sponsors have already returned. Yet to actually chase sponsorship.
Julia has started organising the volunteer committee.
In previous years they needed to cap numbers and turn people away. Hoping that this time if numbers look good then there is an option to expand the number of seats available.
The sprint has been moved to avoid running at the same time as the GP. New venue is attached to a TAFE close to the city. This provides an opportunity to market to students and attract new participants.
Need to add Julie to the future subcommittee meetings.
DrupalCon Singapore Subcommittee
Have been working very hard on trying to cut costs and raise more money.
Have now removed shirts fom the budget.
Now forecasting an $8,000 loss.
Advocating for sponsorship from Japan and India to help ensure that the event can run in future years.
Networking drinks have been renamed to “Drupal CMS Release Candidate Launch Party” this allows offering a new low cost sponsorship option to the participants in that work.
Automattic have now confirmed sponsorship.
A few other problems were discovered.
Currently asking the Drupal Association if recording the second day of videos which would help reduce costs further.
The operating budget is still showing an $60,000 loss, but more money is expected to come in soon.
Ticket sales are at 148 currently with about another 20 coming in each week. Reasonably confident of getting to 250.
Kiwi PyCon / Python NZ
Xero has been fully reconciled now and shows a loss of a bit over $34,000
Venue options for the next conference are limited. Shed 6 looks like it may be the best option. Venue and A/V costs looks like being $40,000. Looking to cut back on other fixed costs.
Looking at November next year to give a longer on ramp and time to make new decisions
Looking to form the next Kiwi PyCon committee in about two weeks after the new Python NZ committee meets.
WIll also try and make sure there is no clash with PyCon AU. November has an advantage because of KawaiiCon running at the beginning of November which may make things easier for international visitors.
One issue with the accounting was difficulties paying invoices in euros, via NZ dollars converted from AU dollars. This caused significant confusion which is not yet fully resolved.
Joomla
A draft budget has been shared with the LA Council
The account has been closed and the money moved due to a limited number of bank accounts being available to LA. The money is still tracked separately in Xero and will be available to the Joomla subcommittee in the future.
Memberships have been at $50/year. Expecting at least 20 members.
There are few expenses as many operating costs are being donated (e.g. hosting).
Have decided to use $400 from the last event to seed the next one in Brisbane next year.
Will provide more detailed budgets in future years.
LA is also able to provide web hosting if required.
The 2025 event will probably be in either March or April. The date will be confirmed at the Joomla subcommittee next meeting.
Hoping for a slightly larger event – perhaps 30-40 people as opposed to this years 20.
PyCon AU
Happening in just over two weeks.
Sponsorship: Now at $73600 from 10 sponsors, up $42500 from last month. We have 6 sponsors with booths at the main conference (table + TV setup where their staff can chat with attendees) and 2 running workshops at the sprints.
Ticket sales: We have 420 attendees now and are projecting 430-450 attendees.
Current Challenge: We need to commit to catering (or not) by Friday. Minimal lunch + all day coffee catering will cost ~$77k. Based on the latest budget projections we must choose between making ~$66k profit (no catering) or ~$11k loss (minimal catering). There is a small chance of running catering with a neutral budget if we get some last minute sponsors or professional ticket bundle sales, though we can’t count on either. We would like some guidance from LA on if this level of loss would be acceptable so that we can run a conference that leaves attendees and sponsors happy to attend again next year.
LA has voted on the catering option and approved the minimal catering option.
Flounder
No meeting this week. Postponed a week.
LUV
No meeting this week. Postponed a week.
WordPress
No update
Everything Open 2025
Session selection confirmation have been sent out
13 tickets sold already
Two keynotes are locked in. Talking to a third.
Schedule is almost complete.
Have a kubernetes CTF session setup on Wednesday
Talking to SUSE about sponsorship
Donna Benjamin is also chasing sponsorship.
Purplecon
Completed all arrangements – catering, A/V, volunteers etc.
Almost everything has been paid for except for the remaining A/V payment and a few other small items.
Purplecon now has a new sponsor. So sponsorship stands at $10,010 (ex GST)
194 tickets. Less than hoped for but more than expected.
Total expenditure has dropped by about $8,000
Projected loss is now about $4,000
These figures do not properly account for GST so the projected result is pessimistic.
The conference is in two days.
Admin Team
Apologies for last month. Unavoidable scheduling conflict.
Test version of an online docs system (Nextcloud) is running. Many lessons learned. Will discuss its suitability and setting up on an appropriate domain.
Draft budget for next year has been prepared.
Would like to have the admin face to face to happen twice this year. Partly because there is a new potential admin team member.
The HDD component of the storage upgrade has been deferred to next year because of difficulty finding the most appropriate drives at a reasonable price.
Everything Open email has been moved to Fastmail. Seems to be working.
Looking at cutting the LA lists over this coming weekend.
Stephen would like to schedule a meeting to discuss backups with Joel.
Need a key for video uploads.
Sae Ra will be organising an EO A/V meeting in November.
LA Council would like to thank Steve for all his work sorting out email.
4. Items for noting
Ongoing RSE sponsorship (email received 21 Jun 2024) – to be discussed by Sae Ra with Rowland in the first instance.
Write the Docs tickets need to be used soon.
We should close the Health Hacks bank account. Russell will notify Craig Askings
5. Other business
MOTION: That LA approves PyCon AU 2024 following Option 1 from their proposal, to provide catering to their attendees.
Moved via email by Joel
Seconded by Sae Ra
Passed unanimously
6. In camera
Three items were discussed in camera
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 21:15
Next meeting is scheduled for 2024-11-20 at 08:00 AEDT (UT+1100) on jitsi
Meeting opened at 20:06 AEDT by Joel and quorum was achieved.
Minutes taken by Neill Cox
2. Log of correspondence
Fwd: Request for ATO advice – 1-1474HXZI [SEC=OFFICIAL] – ATO response to an LA request – forwarded to council byRussell
Fwd: Your Stripe transactions have exceeded a tax threshold in France – email from Russell to LA treasurers about implications of accepting overseas payments
[LACTTE] Joomla Meetup Group – query from Sae Ra as to whether the Joomla SC intends to keep using Meetup. The answer is no.
Did an upcoming LA events post – FYI if you would like to share from your own accounts – Kathy Reid prepared an announcement email for LA. Jonathan has responded.
WordCamp Sydney – discussion about the implications of the WP Engine/Wordpress.org controversy. Jonathan and Joel have responded
Refund to be issued to WP Engine – discussion about sponsorship for WordCamp Sydney – Joel and Wil have responded
DrupalCon Singapore bill payments – Russell has responded
[LACTTE] BAS Time – email from Russell to LA SC treasurers
Tax Invoice #1762896 – Please Do Not Reply – linux.org.au has been renewed
WordCamp Brisbane 2025 subcommittee – WC Brisbane have provided a proposed budget
Missing info for NZ GST – query from Andrew Ruthven about GST reconciliation
DrupalCon Singapore Bill Payments – Discussion between Russell and Joe Chin about DrupalCon Singapore and Wise
[LACTTE] Automattic Sponsorship – discussion about accepting sponsorship from Automattic for DrupalCon Singapore in light of recent controversy
3. Items for discussion
WordCamp Sydney
Is going ahead. Not quite sure of the current status of the WP Engine refund issue. Conference is this weekend. Doing well in trying times.
DrupalCon Singapore
Met with Michael and Joe Chin last week. Remarkable turnaround in the budget situation. Some questions about the Automattic sponsorship. The Drupal Association has approved the sponsorship but some further questions have arisen.
Budget has gone from a $100k loss to a projected loss of $3K. Removing the contingency funds would lead to a $2k profit. Not much short of miraculous. This does assume that the Automattic sponsorship goes through.
WordCamp Brisbane
Discussions are ongoing.
EO2025
Session selection committee met on the weekend. Went well. Plenty of talks. Very high quality submissions. Hardest task may be selecting from the best talks.
Now going through requirements for travel assistance. Choosing between small amounts of financial assistance or perhaps presenting remotely.
Like other conferences the budget is tight due to a lack of sponsorship.
In pursuit of our goal of using OSS wherever possible EO2025 is using a NextCloud instance. This is a proof of concept. If it goes well LA will set up a more general use NextCloud instance.
There are a few bugs, but working reasonably well overall. Formulas from Google Sheets do not always transfer correctly. Mostly just works.
Constitutional Updates
Work is underway to walk out what the differences between our current constitution and the model constitution is. There are significant differences that will need to be worked through. It is not going to be a small task to align our current constitution to the model constitution. We may need to instead just need to work out the minimum we need to add or change in our constitution.
Alternatively we may need to just put forward an entirely new constitution to avoid having to vote and a large number of changes.
Russell has a very ugly diff that could perhaps be shown to the members to illustrate the difficulties we face in getting the constitution updated to match the new model constitution.
All council members should read the new model constitution.
Meeting opened at 19:35 AEST by Joel and quorum was achieved.
Minutes taken by Neill .
2. Log of correspondence
Announcing conference speakers, talks and the full schedule – WriteTheDocs grant update
Re: RSEAA24 Accessibility Fellowship grant invoice – Russell has paid the RSEAA24 sponsorship
Written notes from tonight’s Council Update – PurpleCon Update
Re: Insufficient funds – Python NZ – Russell has responded
Re: PyCon AU Money that went to the wrong account – all sorted now
Fwd: Kiwi Pycon – Would you authorize this overlimit payment please ? – Details of IRD loan payments in Kiwi PyCon budget
KangarooLLM – I think it’s a case of open washing LA should keep an eye on … – From Kathy Reid. Jonathan has responded
Spam through to the grants list – We are monitoring the situation
Amazon Web Services Billing Statement Available [Account: 103334912252]
WordCamp Brisbane – WC Brisbane have brought Wil Brown on as a mentor. They would like to form an LA Subcommittee. Joel has responded.
Re: Following up on Linux Australia sponsorship – LA is entitled to two attendee tickets
Release Candidate 1 of the Open Source AI Definition is Here – OSI announcement
Upcoming Domain Renewal Notice – Please Do Not Reply – linux.org.au will be automatically renewed on 2 Nov
ATO SAP Form has been submitted – Russell has submitted our Substituted Accounting Period form to the ATO
ATO NFP Self Review Has been submitted – by Russell
FYI the Carlos Cordero scandal has been published in NZ press – Kathy Reid informs us. Sae Ra has responded.
Joomla sub-committee – Nathan Morrow enquired when the next subcommittee meeting was. Joel has responded
[Purplecon] Update on the status of Purplecon 2024 – update on the state of the PurpleCon conference.
WordCamp Brisbane 2025 subcommittee – Joel has responded to the request to form a subcommittee
3. Items for discussion
Drupal South Subcommittee
Canberra Community day in November coming together nicely. Venue booked, 67 tickets sold so far. The conference usually sells more tickets in the second half than the first. Will almost certainly hit the minimum budget number. Confident of getting to the medium target and may actually hit the maximum target. Sponsorship is going well. On track for a 4K profit.
Slightly different mix of sponsors this year.
Talks/content are looking good. THe community days should be back as a regular item.
Committee Elections: All 5 places are up for election, response to call for nominations has been a little unenthusiastic. The nomination period has been extended. Prodded some people who the current committee would like to see stand.
At the moment there are not enough nominations to make an election worthwhile.
May use some money from the community day to promote the elections. This should not significantly affect the financial position of the conference.
Perhaps consider expanding the position descriptions to indicate the number of hours per week expected?
Melbourne:
Core committee is in place
Some initial sponsors have returned.
Expecting some challenges in filling the remaining sponsorship slots.
Some of the local committee have just returned from DrupalCon Barcelona.
Keen to get the call for papers out soon and be more active about curating content.
Have had conversations with people who are keen to use the venue to promote Drupal careers.
One of the biggest current challenges is nailing down the keynotes.
DrupalCon Singapore Subcommittee
Call for scholarships last month – 29 applicants mostly from Asia, slime from Africa one from Canada. A great opportunity for people to attend a DrupalCon for the first time.
The various regional community leaders have been coordinating and will be having a booth to
Acquia (the largest Drupal company by value) has just gone through a major layoff. Many roles have been replaced by people from cheaper sources of labour. Acquia has decided not to sponsor the conference.
There are 92 confirmed attendees, plus 14 scholarships, 10 community leaders and 20 speakers. Need to push sales to get to 250 people which is the minimum the venue will charge for. As a result discounts are being offered to meetups and groups. This has driven an increase in traffic.
Would like to use SGD5000 from the budget for a LinkedIn advertising campaign to target Drupal people in Singapore.
The conference is looking at a SGD100,000 (!AUD115,000) loss at the moment. The only way to make this back will be to run more events.
The organisers feel that the conference in Japan is better placed to return a profit.
The LA council is prepared to approve a SGD500 advertising budget, but we will need to look at how costs can be cut because a SGD100,000 loss is not acceptable.
The LA council cannot approve another event without much more detail.
The organisers have already cut about SGD50,000 from the budget and don’t feel that much more can be cut. They are happy to schedule a meeting to discuss what has been done and look for further cuts.
There is a meeting with the Drupal Association tomorrow morning. Mike will pass our concerns on to the DA.
A number of other conferences are facing similar difficulties so LA needs to manage finances carefully.
Charter – have added details around dissolution and returning assets to Linux Australia.
The LA Council needs to vote on the charter.
Planning another event for next year (March or April – check for clashes with Drupal). Smaller event, low key, community focused. Free venue!
Have set up a discord server. Open to everyone.
Monthly meetings are steady at about 20 participants.
The LA Council has requested a budget for next year.
PyCon AU
Sponsorship: Now at $31 100 in sponsorship from 5 sponsors, up $12 000 from last month. Current efforts are focused on selling the opportunity to run workshops during sprints.
Program: Our keynotes have been announced: Linda McIver and Kendra Vant
Ticket Sales: We have sold tickets to 209 attendees: 188 general admission, 18 specialist track only, and 3 sprints only. Projected total ticket sales is ~450.
Financial Aid: We have approved up to $4070 total in financial aid to 5 people. Due to the tight budget this year we prioritised asks from speakers, organisers, and attendees local to Melbourne. Applicants have all been notified of application outcomes.
Volunteers: We have accepted 32 volunteers from 118 applications. 17 general volunteers and 15 AV. Applicants have been notified of application outcomes.
Sprints: We have booked 3 rooms for sprints at Hotel Grand Chancellor. This will either be 2 sprint rooms and a workshop room, or 3 sprint rooms. Significantly cheaper than MCEC, not super close to the main venue, but within Melbourne’s free tram zone so easy to access by public transport.
Current Challenge: Our volunteer coordinator is stepping down for health reasons, their role is being handed over to other members of the core team
Flounder
BTRFS raid 5 / 6 testing event. Testing both performance and reliability. Results were not very impressive.
LUV
LUV venue has closed. Need to find another one. Online meetings will continue.
Committee election will be held in December. Formal announcement will go out on the mailing list, but the members have been notified.
WordPress
The Automattic story is causing some drama.
The conference is over budget.
Have sold 105 tickets, need to sell as many more. $10k in ticket sales.
125% sponsorship.
$60,640 – well over budget.
Swag is a bit more expensive than expected.
Catering – $5k for a coffee cart – which is twice what a coffee van cost in 2019. May make people go across the road to buy coffee. Perhaps get vouchers?
Call for volunteers open. Call for speakers closed.
Ticket sales are slower than the organisers would like.
The budget is looking good, but there are some decisions still to be made.
Everything Open 2025
Not a lot of progress, but what needs to have been done has happened.
Call for sessions has closed.
Session selection committee will meet in two weeks.
There are a lot of awesome talks.Will be a struggle to say no to some of them. Good range of topics.
Two keynotes are locked in, working through other possibilities.
Sponsorship is going OK, but like every other conference has been quieter this year.
Need to start looking at 2026.
Purplecon
We have confirmed and paid the deposit for audio-visual setup.
We are in the process of arranging catering (bubble tea).
We have confirmed and received money from 2 additional sponsors for $1000 and $5000, for a total of $8500 sponsorship.
Sponsors are now published on the website.
So far, we have sold 133 out of 324 tickets. Gross income so far is $12,778.88. If this pattern continues, that projects to $31,130 in gross ticket income.
We do not have to pay Eventbrite fees on that money (the attendees paid the fees) but we will have to pay GST, so subtract 9%.
We are advertising to sell more tickets.
Our current projected expenditure is approximately $36,300. With $8500 in sponsorship, leaves us ~$1500 short. We’re hoping to find another sponsor to close the gap.
Admin Team
Working on a collaborative document space for the EO session selection committee.
Will hold off on mailing list migration until after EO session selections are done.
4. Items for noting
5. Other business
6. In camera
One item was discussed in camera.
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 21:16
Next meeting is scheduled for 2024-10-23 at 20:00 AEDT (UT+1000)
Meeting opened at 19:44 AEDT by Joel Addison and quorum was achieved.
Minutes taken by Neill Cox and Jonathan Woithe
2. Log of correspondence
Joomla sub-committee – some confusion over meeting invite. Now resolved.
ACTION: Council to look over the proposed agreement for us to sign off on. Bring any amendments for the next meeting.
DrupalSouth Melbourne Budget Proposal – budget has been approved and the subcommittee notified
Fwd: Kiwi Pycon – Would you authorise this overlimit payment please – This has been done
NFP Self Assessment – discussed below
PyCon AU Money that went to the wrong account – Russell has resolved
LA ANZ bank statement line: Baker Inv 504454 – Russell has resolved
Accounting Period, TFN, and NFP self assessment – Discussed below
Request for approval on $5k+ invoice – Joel has responded
Everything Open 2025 Call for Sessions is now open – has been extended
Fwd: Debit initiated from your bank account – Russell has resolved
Announcing conference speakers, talks and the full schedule – update from WriteTheDocs conference
3. Items for discussion
Not for Profit Self Assessment & Standard Accounting Period Adjustment
Question 2 – what category is the only contentious issue. Our best fit seems to be “scientific organisation”
We will need to change our constitution to retain our not for profit status. This needs to be done by June 30 2025. We need to address what happens to our assets if we are wound up.
We also need to sort out our accounting period. Currently we use a non standard accounting period (1 Oct to 30 Sep) this requires approval by the ATO, which we do not currently have.
Our accountant can not submit the NFP assessment for us, but can nominate our treasurer as a principal authority for Linux Australia so that they can submit the paperwork.
Here is a clause on that page that seems to fit us:
Example clause 2
The [organisation] is established to be a charity with the purpose of
advancing the [___industry___] in Australia, particularly in
[___location___] by:
– conducting and publishing research into improvements to the
processes used in the industry
– working with government at all levels to ensure that the
interests of the [___industry___]
– industry are represented in regard to the public decision-making
process, and
– providing a forum for all people engaged in the [___industry___]
to discuss best practice and enhancing the future of the
industry.
Motion: That we agree to fill out the NFP form in line with the scientific organisation category and submit the change in financial year form to meet the requirements of the NfP self-assessment.
Moved: Joel
Seconded: Sae Ra
Passed: Unanimously
Open Source Tooling
We have had a number of emails from the community regarding open source tooling. Tonight we are testing Jitsi.
Sae Ra is also working on a paper to present at the AGM around the consideration of the tooling of Linux Australia.
The NZ open source society has shown the tooling that they are using and this is being investigated. Video Conferencing – NZOSS is running their own BBB instance and NextCloud with Only Office layer for collaborative documentation. Often scaling hasn’t been tested for video conferencing. This is why we are testing out Jitsi tonight. Whether it will work with large numbers of features including waiting rooms, registration of members for meetings etc. https://nzoss.nz/online-services
Joel to work with Steve to get the collaborative document environment in time for the Session Selection Committee meeting.
We need to make sure that we come up with standard operating procedures for privacy, backups, permissions etc.
Proposal to use Zoom for the next meeting for the subcommittees, November subcommittee meeting for jitsi. Try BBB for the meeting with just us. We can try things out for data points.
4. Items for noting
EO2025
Team is slowly moving along., Chair is away for the next few weeks. Two keynotes are locked in. Open SI have signed as a sponsor as have CodeConstruct. Red Hat likely to come onboard.
Core team members are short of time for work outside of meetings. Dinner venue quote has been received, we will likely pay a deposit soon.
Meeting opened at 19:35 AEST by Joel and quorum was achieved.
Minutes taken by Neill and Jonathon.
2. Log of correspondence
Re: Incoming Funds – discussion about dealing with funds in SGD – Russell has responded
Reminder: Evidence is due soon for a dispute on Drupal Singapore 2024 – a dispute has been raised for a Drupal Asia invoice. Russell has responded
Fwd: Re: Merri-bek Tech course – grant application decision – Query about payment of grant
Upcoming Domain Renewal Notice – Please Do Not Reply – reminder about linux.org.au – admin team are dealing with this
Amazon Web Services Billing Statement Available
Action required – Your AWS account is past due
WordCamp Brisbane 2025 subcommittee – application to form a subcommittee for WordCamp Brisbane 2025 – Joel will respond to this request. We will ask Wil to assist in setting up the subcommittee.
Scanned Post – Wil has collected a Westpac statement for LA
A draft Open Source AI Definition, and welcoming Elastic home – OSI newsletter
[Purplecon] Update on the status of Purplecon 2024 – update on the progress of Purplecon
Re: Semi urgent: Access to PyCon AU Stripe Account – discussion about PyCon AU accounts – Russell has responded
[LACTTE] DrupalSouth Melbourne Budget Proposal
Re: RSEAA24 Accessibility Fellowship grant invoice – Russell has responded
3. Items for discussion
Drupal South Subcommittee
Venue contract needs to be signed tomorrow (12 Sep)
Not worried about attendee numbers for Melbourne. Minimum expected is 150.
Sponsorship should be dependable. They have turned up every year and pricing details and need to make the budget work have been shared with sponsors. Response has been positive.
Considering a new sponsorship level (Titanium, 20K, two nibbles so far)
Both Melbourne and Singapore have been working with some non traditional sponsors.
High level expectation is that attendees will come in at the medium level (200 attendees).
DrupalCon Singapore Subcommittee
Sponsorships are down from forecasted. Drupal Association has agreed to adjust the credits received by sponsors based on level of sponsorship, This has helped to bring some new sponsors in, and increase the level of others.
There is other discussion going on about financial lifelines if needed, given this is a new event. Will wait and see how it goes.
Many (29) responses to the assistance program
Training courses are going alright, some sponsors are also interested in helping with this
136 attendees including committee members, volunteers, ticket purchases and sponsor’s tickets if they use all of them.
Ticket sales are slowing down. Hard to know if this is just timing until the event or not, given it is new.
Kiwi PyCon / Python NZ
Conference ran well.
Just under 220 attendees
Videos are online
Still working on finances. Expect to be close to the forecast loss.
Will be looking for cost cutting measures to make the budget work. Need to drop costs by about half.
Will be a week or so before Xero is properly reconciled.
Council will need to discuss the IRD loan as it wasn’t in the initial agreement. Python NZ are still attempting to dispute the loan. Two repayments have been made. Payments need to be made to avoid having the lender insist that payment must be made immediately and in full. Legal advice was to pay the instalments until the legal issues are resolved.
Joomla
No update (Nathan emailed council@ during the meeting to say they hadn’t received any email communication since July and didn’t know when the next meeting was. Jonathan responded and indicated that Neill will follow up.)
PyCon AU
Sponsorship: $19,000 committed which is $7,000 lower than the target.
63 talks confirmed.
Tickets launched on Monday 32 sold so far.
No childcare due to budget constraints. There will be free tickets for under 12 instead.
Financial aid is open – and some will be provided as a sponsor has specifically provided money for financial assistance.
100 applications for volunteers. WIll have to say no to some.
COVID policy included livestreams, but this is too expensive (due to venue costs) and will have to be walked back. Some refunds may be required.
Flounder
No update
LUV
There was a meeting at the beginning of the month.
Trying to motivate people join some sprints to do some admin work (e.g. website updates)
WordPress
T-8 weeks 07 on Saturday
Moved to weekly meetings
Comms on slack as
67 tickets at $70 = $4,690
11 mini sponsors at $175 = $1,925
Total = $6,615 from projected $19,670 or 33% – should get more sales once speakers are announced.
Sponsorship at $58,665 of projected $46,600 or 125%
Overall income at $64,930 of projected 66,280 or 97.96%.
Speaker acceptance emails went out on Monday.
Draft schedule.
Start announcing confirmed speakers next Monday.
Release schedule as soon as all speakers have confirmed.
Gathering merch/swag quotes.
Catering looking to source a quote for a barista coffee machine and operator.
Have had to turn down some Platinum sponsors due to space constraints!
Wil is aware of the WordCamp Brisbane organisers. They would like to make an announcement in November.
Everything Open 2025
Lots of interest in sponsorship, so far have 1 signed on and another has a contract in hand totalling 23K. so it’s a good start.
Neill Cox has kindly agreed to be the treasurer
There’s lots of help required regarding comms/marketing.
Sae Ra has reached out to the University of SA Open Source club and they are interested in submitting something and also volunteering, will also help with spreading of the word for us in Adelaide amongst the Uni students.
Session Selection Committee has been formed. Currently sitting at 15 talks which is not bad given we have another 11 days before it closes.
Sae Ra will be running a “write your proposal” session in the next week or so just to see if we can get some interest. Need some help passing that message along too.
Keynotes will be contracted on Friday.
Things are ok. Sae Ra just needs the team to do stuff rather than her.
Purplecon
Sold 74 of 340 tickets whch means about $8,000 revenue so far. Revenue is currently projected at $34,000.
Expenses are expected to be between $38,000 and 40,000.
This would mean a loss of a few thousand dollars.
Looking for another sponsor to try and close that gap.
Selected 9 talks and notified the speakers.
Ticket sales via eventbrite not using stripe for payments.
Admin Team
Ready to cut over the mailing lists. Delayed by personal issues.
Almost ready to submit next year’s budget.
Julian Demarchi would like to rejoin the admin team
Would like to do a face to face this year, but adding an extra team member will increase the cost
Replacement drives have been sourced. Bill will be sent soon (approx $5,500)
Need to confirm that backups are working. Will need assistance from Joel for that.
Would like advance notice of the election notice being sent out, so that the prep work can be done.
Motion that we adopt the Drupal South 20-25 as a subcommittee of Linux Australia and accept their proposed budget.
Moved: Russell Stuart
Seconded: Joel
Passed unanimously
Self assessment as a non-profit
We will need to make changes to the constitution to be able to qualify as a non-profit.
Council will look into this before the next meeting.
Python NZ IRD Loan
Python NZ has an outstanding loan from the NZ IRD for a COVID-19 grant. This was taken out without the knowledge of the Python NZ Committee, which is against their constitution. They are disputing the loan on this basis, but this is still in progress.
The first repayment of the loan is now due for payment. The loan amount was not part of the original money approved by Linux Australia to assist Python NZ due to the dispute being in progress. They would now like us to approve paying the loan instalments to avoid further issue, while they continue discussions with the IRD.
Motion: That Linux Australia agrees to pay the outstanding IRD loan, subject to an agreement that if it is refunded that the money returns to Linux Australia.
Moved: Joel
Seconded: Neill
Result: Motion passed unanimously.
4. Items for noting
No items tabled.
5. Other business
No items tabled.
6. In camera
Three items were discussed in camera
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 20:58
Next meeting is scheduled for 2024-09-25 at 19:30 EST (UT+1000)
Meeting opened at 19:38 AEDT by Joel and quorum was achieved.
Minutes taken by Neill
2. Log of correspondence
Jonathan Woithe: Merri-bek Tech course – grant application decision- Jonathan has informed Jade that the grant application was successful
Wil Brown: Scanned Post – three items of mail received at the PO Box and scanned to the drive folder (ATO NFP review, Westpac Statement and a copyright demand on behalf of PicRights (almost certainly a scam)
Namecheap: linux.org.au domain renewal reminder
AWS: AWS billing statement
ATO: Activity Statement reminder – BAS has been lodged
OSI: The conversation about the role of data for Open Source AI continues
Russell Stuart: Re: Bill INV-5358 from Purpose Accounting is due – Russell has responded
Lyndsey Jackson: Request for grant partner and auspices consideration
Jennifer Sims: Grant Application from Jennifer Sims for Sports Connection Australia – Gender Diverse Sporting Streaming – has been responded to
Joe Chin: Linux Australia Finance Induction re: DrupalCon Singapore – Russell has responded
Russell Stuart: Fwd: Debit initiated from your bank account – Russell forwarded to Joe Chin
Jonathan Woithe: Sports Connection Australia – grant application decision
Jonathan Woithe: “Write the docs” sponsorship – grant application decision
Russell Stuart: Fwd: American Express has initiated a S$655.45 SGD dispute
Zoom: Zoom is simplifying AI Companion enablement for Pro users
Michael Richardson: Re: Fwd: DrupalCon Licence Agreement – Joel has responded to Michael Richardson
Everything Open 2025 Call for Sessions is now open
Russell Stuart: NFP Self Assessment – Russell has asked our accountant for advice.
Russell Keith-Magee: Re: Email aliases in the Xero account settings – Russell Stuart has responded
Morgan Serong: Notice of Linux Australia 2024 Annual General Meeting (AGM) – Morgan has requested an update on progress replacing Zoom for video conferencing
Richard Shea: Re: Kiwi Pycon – Another transfer please – Russell has responded
Swapnil Ogale: Re: Following up on Linux Australia sponsorship – Joel and Russell have responded
Steve Walsh: Assistance with Auspice / patronage of unincorporated association – LA has agreed to provide an ABN to use for the registration of nerdvana.org
Peter Hall: PyCon AU venue deposit payment – Peter Hall has accepted an invite to the subcommittee meetings
sponsorship@foss4g-oceania.org: Re: Sponsorship Opportunities for FOSS4G SotM Oceania 2024- Jonathan has responded to a query about sponsorship of FOSS4G Oceania
Michael Richardson: DrupalSouth Community Day Budget – Joel and Russell have responded
resolveAU@picrights.com: FINAL NOTICE-Unlicensed Use of Reuters News & Media Inc Imagery – Reference Number: 7536-1819-1990
Katie McLauglin: PyCon AU Subcommittee listing update – Joel and Sae Ra have responded
3. Items for discussion
From Federal Grant Application: Do you have a plan to manage any potential security risks associated with the project and your organisation more broadly?
The plan should include protecting your organisation from potential national security risks including cyber security threats and the secure handling of data. We may ask for a copy of your plan at a later stage.
Our answer is currently no, but we should look at this. What should go into such a policy? Do we need insurance? If so, do we need to install Anti Virus software on all the containers in our k8s cluster?
CAVAL has just written such a plan, which is under a CC licence and has been reviewed by CAVAL’s insurer.
4. Items for noting
LA’s YTD Financial Outcome
Note: These figures are provided for guidance only, and are our best estimate for this point in time.
Year to date
Linux Australia
LA Income 8,409
LA Exp - Grants -56,643
LA Exp – Insurance -5,582
LA Exp – Accounting -7,709
LA Exp – Other -8,928
TechTeam -526
LUV -473
WP-Aust -1,366
Actuals
DruCom23 -7,907
EO2024 18,634
JoomRec24 388
DruSth24 17,361
Estimates
Kiwi Pycon (Conf) -24,809 (projected, probably close)
NzPug (Fraud) -10,556 (max, we didn’t agree to this, might be 0)
Server Maint -5,000 (TechTeam Budget)
----------------------------------
LA Profit/Loss 2023/2024 -84,706
----------------------------------
Next Year
DruSth24 -30,000 (estimate from last meeting)
PurpleCon 10,000 (assumes $0 of $20,000 budgeted sponsorship)
WcmpSyd24 20,000 (budget profit, sponsorship exceeds budget)
PyconAU24 0 (budget mid, no new sponsorship)
For now we will take no action, but will keep an eye out for future development.
Zoom is simplifying AI Companion enablement for Pro users – looks like we need to opt out before 19 Sep if we don’t want the AI companion
We have opted out of the AI companion.
Replacing Zoom for video conferencing
This is under investigation, but we have not yet settled on an option. We will run a regular council meeting (not a sub-committee meeting) using an alternative platform.
6. In camera
No items were discussed in camera
7. Action items
Respond to Morgan Serong’s query – Neill
Investigate a cyber security plan – Jonathan (and Sae Ra)
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 20:20
Next meeting is scheduled for 2024-09-11 and is a subcommittee meeting
I recently replaced the screen of a Google Pixel 3A XL, the new panel is made by tianma and worked well under Andoird, until it doesn’t. On every boot up the screen will work until the phone went to sleep, and then the screen will stop responding to touch, until another reboot. After the screen became unresponsive, the rest of the phone would remain responsive during the locked state and it’s possible to unlock the screen with fingerprint, but there is no way to make the touchscreen responsive again without reboot.
To fix this, go to Settings -> System -> Gestures and disable Double-tap to check phone. After which the screen should no longer stuck into unresponsive state. This seems to be a common problem affecting many phones with replaced screen.
Google will surely shutdown their support forum one day and I encourage everyone to put their notes somewhere reliable, like a selfhosted blog :)
I’ve recently been going through the Cisco Cyberops NetAcademy course as part of a TAFE unit I am doing at the moment. While working through the e-learning I took a bunch of notes, and then over the weekend I turned them into an Anki deck to help me prepare for the final exam. I’m actually unsure if I’ll bother with the certification exam, but this seemed like a more useful and reusable way to prepare than just reading and writing private notes.
For those unfamiliar with it, Anki is an Open Source flash card application available at https://apps.ankiweb.net/ — there are clients available for Windows, MacOS, Linux, and smart phones. To use these flash cards, import them into an Anki deck using the “import file” button at the bottom of the main screen. This is a common approach to study for certification exams, especially those which are rote learning heavy like Cyberops Associate is.
The majority of these questions were created by taking my notes from the Cisco NetAcademy course for Cisco Cyberops Associate and turning them into questions and answers. There is however some amount of difference between the content of the official certification guide and the NetAcademy course, and I’ve included things covered by the guide and not the e-learning course as best as I can.
Our 5.94kW solar array with Redflow ZCell battery and Victron Energy inverter/charger system is now slightly over three years old, which means it’s time to review its third year of operation. There are several previous posts in this series:
Go With The Flow (what all the pieces are, what they do, some teething problems)
If you’ve read the above you’ll know that the solar array was originally installed back in 2017 along with a Sanden heat pump hot water service. That initial installation saved us a lot on our electricity bills, but it wasn’t until we got the ZCell and the Victron gear that we were able to really manage our own power. The ZCell allows us to store our own locally generated electricity for later use, and the Victron kit manages everything and gives us a whole lot of fascinating data to look at via the VRM portal.
There were some kinks in the first two years. We missed out on three weeks of prime solar PV generation from January 20 – February 11 in 2022 due to having to replace the MPPT solar charge controller. We also had no solar PV generation from February 17 – March 9 in 2023 on account of having our old tile roof replaced with colorbond steel. In my last post on this topic I wrote:
In both cases our PV generation was lower than it should have been by an estimated 500-600kW. Hopefully nothing like this happens again in future years.
…and then at the very end of that post:
I’m looking forward to doing another one of these posts in a year’s time. Hopefully I will have nothing at all interesting to report.
Alas, something “like this” did happen again, and I have some interesting things to report.
In early December 2023 our battery failed due to a leak in the electrolyte stack. It was replaced under warranty, but the replacement unit didn’t arrive until March 2024. It was a long three months. Then in August when we were looking at finally purchasing a second ZCell, we discovered that Redflow had made a commercial decision to focus exclusively on large-scale deployments (minimum 200 kWh, i.e. 20 batteries) and was thus no longer selling individual ZBMs for residential or small business use. As an existing customer we probably would have still been able to get a second battery, except that in late August the company went into voluntary administration after failing to secure funding to build a new factory in Queensland. The administrators attempted to seek a sale and/or recapitalisation, but this was ultimately unsuccessful. The company ceased operations on October 18 and subsequently went into liquidation. This raises several questions about the future of our system, but more on that later. First, let’s look at how the system performed in year three.
Here are the figures for grid power in, solar generation, power used by our loads, and power exported to the grid over the past three years. As in the last two posts, the “what?” column here is the difference between grid in plus solar in, minus loads minus export, i.e. the power consumed by the system itself, or the energy cost of the system.
Year
Grid In
Solar In
Total In
Loads
Export
Total Out
what?
2021-2022
8,531
5,640
14,171
10,849
754
11,603
2,568
2022-2023
8,936
5,744
14,680
11,534
799
12,333
2,347
2023-2024
8,878
5,621
14,499
11,162
1,489
12,651
1,848
Note that in year three our grid power usage and solar generation are slightly down from the previous year (-58kWh and -123kWh respectively), so the total power going into the system is lower by 181kWh. Our loads are happily down by 372kWh, a good chunk of which will be due to replacing some old always-on computer equipment with something a bit less power hungry.
What’s really interesting here is that our power exported to the grid is close to double the previous two years, and the energy cost of the system is noticeably lower. In the first two years of operation the latter figure was 16-18% of the total power going into the system, but in year three it’s down to a bit under 13%.
The additional solar export appears to be largely due to the failed battery. Compare the following two graphs from 2022-2023 and 2023-2024.Yellow is direct usage of solar power, blue is solar to battery and red is solar to grid. As you can see there’s way more solar to grid in the period December 2023 – March 2024 when the battery was dead and thus unable to be charged:
Why is there still any blue in that period indicating solar power was going to the battery? This is where things get a bit weird. One consideration is that the battery is presumably still drawing a tiny bit of power for its control circuitry and fans, but when I look at the figures for January 2024 (for example), it shows 76.8 kWh of power going to the battery from solar. There is no way that actually happened with the battery dead and unable to be charged.
Here’s what I think is going on: when the battery went into failure mode, the ZCell Battery Management System (BMS) will have told the Victron gear not to charge it. This effectively disabled the MPPT solar charger, which meant we weren’t able to use our solar at all, not even to run the house. I asked Murray from Lifestyle Electrical Services if there was some way we could reconfigure things to still use solar power with the battery out of action and he remoted in and tweaked some settings. Unfortunately I don’t have an exact record of what was changed at this point, because it was discussed via phone. All I have in my notes is a very terse “Set CGX to use Victron BMS?” which doesn’t make much sense because we don’t have a Victron BMS. Possibly it refers to switching the battery monitor setting from “ZCell BMS” to “MultiPlus-II 48/5000/ 70-50 on VE.Bus”. Anyway, whatever the case, I think we have to assume that the “to battery” and “from battery” figures from December 2023 – March 2024 are all lies.
At this point we were able to limp along with our solar generation still working during the day, but something was still not quite right. Every morning and evening the MPPT appeared to be fighting to run. Watching the console at, say, 08:00, I’d see the MPPT providing solar power for a few seconds, then it’d stop for a second or two, then it’d run again for a few seconds. After some time it would start behaving normally and we’d have solar generation for the day, but then in the evening it would go back to that flicking on and off behaviour. My assumption is that the ZCell BMS was still trying to force the MPPT off. Then in mid-Februrary I suddenly got a whole lot of Battery Low Voltage warnings from the MPPT, which I guess makes sense – the ZCell was still connected and its reported voltage had been very slowly dropping away over the past couple of months. The warnings appeared when it finally hit 2.5V. Murray and I experimented further to try to get the MPPT to stop doing the weird fighting thing, but were unsuccessful. At one point during this we ended up with the Mutli-Plus II inverter/chargers in some sort of fault state and contacted Simon Hackett for further assistance. We got all the Victron gear back into a sensible state and Simon and I spent a bunch of time on a Saturday afternoon messing with everything we could think of, but ultimately we were unable to get the MPPT to provide power from the solar panels, and use grid power, without the battery present. One or the other – grid power only or solar power only – we could do, but we couldn’t get the system to do both at the same time again without the battery present. Turns out a thing that’s designed to be an Energy Storage System just won’t quite work right without the Storage part. So from February 15 through to March 14 when the replacement battery arrived we were running on grid power only with no solar generation.
Happily, we didn’t have any grid power outages during the three months we were without a battery. Our first outage of any note wasn’t until March 23, slightly over a week after the replacement battery was installed. There were a few brief grid outages at other times later – a couple of minutes one day in April, some glitches on a couple of days in August, but the really bad one was on the 1st of September when the entire state got absolutely hammered by extremely severe weather. Given there was a severe weather warning from the BOM I’d made sure the battery was full in advance, which was good because our grid power went out while we were asleep at about 00:37 and didn’t come back on until 17:28. We woke up some time after the grid went down with the battery at 86% state of charge and went around the house to turn off everything we could except for the fridge and freezer, which got our load down to something like 250W. By morning, the battery still had about 70% in it and even though the weather was bad we still had some solar generation, so between battery and solar we got through just fine until the grid came back on in the afternoon. We were lucky though – some folks in the north of the state were without power for two weeks due to this event. I later received a cheque for $160 from TasNetworks in compensation for our outage. I dread to think what the entire event cost everyone, and I don’t just mean in terms of money.
Speaking of money though, the other set of numbers we need to look at are our power bills. Here’s everything from the last seven years:
Year
From Grid
Total Bill
Grid $/kWh
Loads
Loads $/kWh
2016-2017
17,026
$4,485.45
$0.26
17,026
$0.26
2018-2019
9,031
$2,278.33
$0.25
11,827
$0.19
2019-2020
9,324
$2,384.79
$0.26
12,255
$0.19
2020-2021
7,582
$1,921.77
$0.25
10,358
$0.19
2021-2022
8,531
$1,731.40
$0.20
10,849
$0.16
2022-2023
8,936
$1,989.12
$0.22
11,534
$0.17
2023-2024
8,878
$2,108.77
$0.24
11,162
$0.19
As explained in the last post, I’m deliberately smooshing a bunch of numbers together (peak power charge, off peak power charge, feed in tariff, daily supply charge) to arrive at an effective cost/kWh of grid power, then bearing in mind our loads are partially powered from solar I can also determine what it costs us to run all our loads. 2016-2017 is before we got the solar panels and the new hot water service, so you can see the immediate savings there, then further savings after the battery went in in 2021. This year our cost/kWh (and thus our power bill) is higher than last year for two reasons:
We have somehow used more power at peak times than during off-peak times this year compared to last year.
Power prices went up about 8% in July 2023. They actually came down about 1% in July 2024, but most of our year is before that.
I should probably also mention that we actually spent $1,778.94 on power this year, not $2,108.77. That’s thanks largely due to a $250 ‘Supercharged’ Renewable Energy Dividend payment from the Tasmanian Government and $75 from the Federal Government’s Energy Bill Relief Fund. The remaining $4.83 in savings is from Aurora Energy’s ridiculous Power Hours events. I say “ridiculous” because they periodically give you a bunch of time slots to choose from, and once you’ve locked one of them in, any power you use at that time is free. To my mind this incentivises additional power usage, when we should really be doing the exact opposite and trying to use less power over all. So I haven’t tried to use more energy, I’ve just tried to lock in times that were in the evening when we were going to be using more grid power than during the day to scrape in what savings I could.
One other weird thing happened this year with the new battery. ZCells need to go into a maintenance cycle every three days. This happens automatically, but is something I habitually keep an eye on. On September 11 I noticed that we had been four days without running maintenance. Upon investigation of the battery logs I discovered that the Time Since Strip counter and Strip Pump Run Timer were running at half speed, i.e. every minute they were each only advancing by approximately 30 seconds:
I manually put the battery into maintenance mode and Simon was able to remotely reset the CPU by writing some magic number to a modbus register, which got the counters back to the correct speed. I have no idea whether this is a software bug or a hardware issue, but I’ll continue to keep an eye on it. The difficulty is going to be dealing with the problem should it recur, given the demise of Redflow. Simon certainly won’t be able to log in remotely now that the Redflow cloud is down, although there is a manual reset procedure. If you remove the case from the battery there is apparently a small phillips head screw on the panel with the indicator lights. Give the screw a twist and the lights go out. Untwist and the lights come back on and the unit is reset. I have yet to actually try this.
The big question now is, where do we go from here? The Victron gear – the Cerbo GX console, the Multi-Plus II inverter/chargers, the MPPT – all work well with multiple different types of battery, so our basic infrastructure is future-proof. Immediately I hope to be able to keep our ZCell running for as long as possible, and if I’m able to get a second one as a result of the Redflow liquidation I will, simply so that we can ensure the greatest possible longevity of the system before we need to migrate to something else. We will also have to somehow figure out how to obtain carbon socks which need annual replacement to maintain the electrolyte pH. If we had to migrate to something else in a hurry Pylontech might be a good choice, but the problem is that we really don’t want a rack of lithium batteries in the crawl space under our dining room because of the fire risk. There are other types of flow battery out there (vanadium comes to mind) but everything I’ve looked at on that front is either way too big and expensive for residential usage, or is “coming soon now please invest in us it’s going to be awesome”.
I have no idea what year four will look like, but I expect it to be interesting.
An advantage of a medium to large company is that it permits specialisation. For example I’m currently working in the IT department of a medium sized company and because we have standardised hardware (Dell Latitude and Precision laptops, Dell Precision Tower workstations, and Dell PowerEdge servers) and I am involved in fixing all Linux compatibility issues on that I can fix most problems in a small fraction of the time that I would take to fix on a random computer. There is scope for a lot of debate about the extent to which companies should standardise and centralise things. But for computer problems which can escalate quickly from minor to serious if not approached in the correct manner it’s clear that a good deal of centralisation is appropriate.
For people doing technical computer work such as programming there’s a large portion of the employees who are computer hobbyists who like to fiddle with computers. But if the support system is run well even they will appreciate having computers just work most of the time and for a large portion of the failures having someone immediately recognise the problem, like the issues with NVidia drivers that I have documented so that first line support can implement workarounds without the need for a lengthy investigation.
A big problem with email in the modern Internet is the prevalence of Phishing scams. The current corporate approach to this is to send out test Phishing email to people and then force computer security training on everyone who clicks on them. One problem with this is that attackers only need to fool one person on one occasion and when you have hundreds of people doing something on rare occasions that’s not part of their core work they will periodically get it wrong. When every test Phishing run finds several people who need extra training it seems obvious to me that this isn’t a solution that’s working well. I will concede that the majority of people who click on the test Phishing email would probably realise their mistake if asked to enter the password for the corporate email system, but I think it’s still clear that this isn’t a great solution.
Let’s imagine for the sake of discussion that everyone in a company was 100% accurate at identifying Phishing email and other scam email, if that was the case would the problem be solved? I believe that even in that hypothetical case it would not be a solved problem due to the wasted time and concentration. People can spend minutes determining if a single email is legitimate. On many occasions I have had relatives and clients forward me email because they are unsure if it’s valid, it’s great that they seek expert advice when they are unsure about things but it would be better if they didn’t have to go to that effort. What we ideally want to do is centralise the anti-Phishing and anti-spam work to a small group of people who are actually good at it and who can recognise patterns by seeing larger quantities of spam. When a spam or Phishing message is sent to 600 people in a company you don’t want 600 people to individually consider it, you want one person to recognise it and delete/block all 600. If 600 people each spend one minute considering the matter then that’s 10 work hours wasted!
The Rationale for Human Filtering
For personal email human filtering usually isn’t viable because people want privacy. But corporate email isn’t private, it’s expected that the company can read it under certain circumstances (in most jurisdictions) and having email open in public areas of the office where colleagues might see it is expected. You can visit gmail.com on your lunch break to read personal email but every company policy (and common sense) says to not have actually private correspondence on company systems.
The amount of time spent by reception staff in sorting out such email would be less than that taken by individuals. When someone sends a spam to everyone in the company instead of 500 people each spending a couple of minutes working out whether it’s legit you have one person who’s good at recognising spam (because it’s their job) who clicks on a “remove mail from this sender from all mailboxes” button and 500 messages are deleted and the sender is blocked.
Delaying email would be a concern. It’s standard practice for CEOs (and C*Os at larger companies) to have a PA receive their email and forward the ones that need their attention. So human vetting of email can work without unreasonable delays. If we had someone checking all email for the entire company probably email to the senior people would never get noticeably delayed and while people like me would get their mail delayed on occasion people doing technical work generally don’t have notifications turned on for email because it’s a distraction and a fast response isn’t needed. There are a few senders where fast response is required, which is mostly corporations sending a “click this link within 10 minutes to confirm your password change” email. Setting up rules for all such senders that are relevant to work wouldn’t be difficult to do.
How to Solve This
Spam and Phishing became serious problems over 20 years ago and we have had 20 years of evolution of email filtering which still hasn’t solved the problem. The vast majority of email addresses in use are run by major managed service providers and they haven’t managed to filter out spam/phishing mail effectively so I think we should assume that it’s not going to be solved by filtering. There is talk about what “AI” technology might do for filtering spam/phishing but that same technology can product better crafted hostile email to avoid filters.
An additional complication for corporate email filtering is that some criteria that are used to filter personal email don’t apply to corporate mail. If someone sends email to me personally about millions of dollars then it’s obviously not legit. If someone sends email to a company then it could be legit. Companies routinely have people emailing potential clients about how their products can save millions of dollars and make purchases over a million dollars. This is not a problem that’s impossible to solve, it’s just an extra difficulty that reduces the efficiency of filters.
It seems to me that the best solution to the problem involves having all mail filtered by a human. A company could configure their mail server to not accept direct external mail for any employee’s address. Then people could email files to colleagues etc without any restriction but spam and phishing wouldn’t be a problem. The issue is how to manage inbound mail. One possibility is to have addresses of the form it+russell.coker@example.com (for me as an employee in the IT department) and you would have a team of people who would read those mailboxes and forward mail to the right people if it seemed legit. Having addresses like it+russell.coker means that all mail to the IT department would be received into folders of the same account and they could be filtered by someone with suitable security level and not require any special configuration of the mail server. So the person who read the is mailbox would have a folder named russell.coker receiving mail addressed to me. The system could be configured to automate the processing of mail from known good addresses (and even domains), so they could just put in a rule saying that when Dell sends DMARC authenticated mail to is+$USER it gets immediately directed to $USER. This is the sort of thing that can be automated in the email client (mail filtering is becoming a common feature in MUAs).
For a FOSS implementation of such things the server side of it (including extracting account data from a directory to determine which department a user is in) would be about a day’s work and then an option would be to modify a webmail program to have extra functionality for approving senders and sending change requests to the server to automatically direct future mail from the same sender. As an aside I have previously worked on a project that had a modified version of the Horde webmail system to do this sort of thing for challenge-response email and adding certain automated messages to the allow-list.
The Change
One of the first things to do is configuring the system to add every recipient of an outbound message to the allow list for receiving a reply. Having a script go through the sent-mail folders of all accounts and adding the recipients to the allow lists would be easy and catch the common cases.
But even with processing the sent mail folders going from a working system without such things to a system like this will take some time for the initial work of adding addresses to the allow lists, particularly for domain wide additions of all the sites that send password confirmation messages. You would need rules to direct inbound mail to the old addresses to the new style and then address a huge amount of mail that needs to be categorised. If you have 600 employees and the average amount of time taken on the first day is 10 minutes per user then that’s 100 hours of work, 12 work days. If you had everyone from the IT department, reception, and executive assistants working on it that would be viable. After about a week there wouldn’t be much work involved in maintaining it. Then after that it would be a net win for the company.
The Benefits
If the average employee spends one minute a day dealing with spam and phishing email then with 600 employees that’s 10 hours of wasted time per day. Effectively wasting one employee’s work! I’m sure that’s the low end of the range, 5 minutes average per day doesn’t seem unreasonable especially when people are unsure about phishing email and send it to Slack so multiple employees spend time analysing it. So you could have 5 employees being wasted by hostile email and avoiding that would take a fraction of the time of a few people adding up to less than an hour of total work per day.
Then there’s the training time for phishing mail. Instead of having every employee spend half an hour doing email security training every few months (that’s 300 hours or 7.5 working weeks every time you do it) you just train the few experts.
In addition to saving time there are significant security benefits to having experts deal with possibly hostile email. Someone who deals with a lot of phishing email is much less likely to be tricked.
Will They Do It?
They probably won’t do it any time soon. I don’t think it’s expensive enough for companies yet. Maybe government agencies already have equivalent measures in place, but for regular corporations it’s probably regarded as too difficult to change anything and the costs aren’t obvious. I have been unsuccessful in suggesting that managers spend slightly more on computer hardware to save significant amounts of worker time for 30 years.
Related posts:
blocking spam There are two critical things that any anti-spam system must...
I got interested today in trying to come up with a solid way of determining when updates were last applied to a RHEL-derived Linux instance. Previously we’d been inferring it from the kernel version, but it turns out there is a convenient “yum history” or “dnf history” command which will show you all the previous transactions that the package database has seen. However, the output is hard to parse in a script.
So here instead, is a little python script which does the thing. Feel free to mangle it to meet your needs:
import time
import dnf.base
b = dnf.base.Base()
for transaction in b.history.old():
print('---------------------------------------------------')
print(f'Transaction id: {transaction.tid}')
print(f'Command line: {transaction.cmdline}')
print(f'UID: {transaction.loginuid}')
print(f'Return code: {transaction.return_code}')
start = time.strftime('%Y-%m-%d %H:%M', time.localtime(transaction.beg_timestamp))
end = time.strftime('%Y-%m-%d %H:%M', time.localtime(transaction.end_timestamp))
elapsed = transaction.end_timestamp - transaction.beg_timestamp
print(f'Duration: {start} -> {end} ({elapsed} seconds)')
for tran in transaction.packages():
if tran.is_package():
details = f'{tran.name} {tran.version}'
elif tran.is_group():
g = tran.get_group()
packages = []
for pkg in g.getPackages():
packages.append(pkg.getName())
details = f'Group "{g.getName()}" ({", ".join(packages)})'
elif tran.is_environment():
e = tran.get_environment()
details = f'Environment "{e.getName()}"'
else:
details = '...unknown transaction type!'
print(f' {tran.action_name} {details}')
print()
To find only whole system updates, you’d look for command lines containing “upgrade” I suspect.
Julius wrote an insightful blog post about the “modern sleep” issue with Windows [1]. Basically Microsoft decided that the right way to run laptops is to never entirely sleep, which uses more battery but gives better options for waking up and doing things. I agree with Microsoft in concept and this is something that is a problem that can be solved. A phone can run for 24+ hours without ever fully sleeping, a laptop has a more power hungry CPU and peripherals but also has a much larger battery so it should be able to do the same. Some of the reviews for Snapdragon Windows laptops claim up to 22 hours of actual work without charging! So having suspend not really stop the system should be fine.
The ability of a phone to never fully sleep is a change in quality of the usage experience, it means that you can access it and immediately have it respond and it means that all manner of services can be checked for new updates which may require a notification to the user. The XMPP protocol (AKA Jabber) was invented in 1999 which was before laptops were common and Instant Message systems were common long before then. But using Jabber or another IM system on a desktop was a very different experience to using it on a laptop and using it on a phone is different again. The “modern sleep” allows laptops to act like phones in regard to such messaging services. Currently I have Matrix IM clients running on my Android phone and Linux laptop, if I get a notification that takes much typing for a response then I get out my laptop to respond. If I had an ARM based laptop that never fully shut down I would have much less need for Matrix on a phone.
Making “modern sleep” popular will lead to more development of OS software to work with it. For Linux this will hopefully mean that regular Linux distributions (as opposed to Android which while running a Linux kernel is very different to Debian etc) get better support for such things and therefore become more usable on phones. Debian on a Librem 5 or PinePhonePro isn’t very usable due to battery life issues.
A laptop with an LTE card can be used for full mobile phone functionality. With “modern sleep” this is a viable option. I am tempted to make a laptop with LTE card and bluetooth headset a replacement for my phone. Some people will say “what if someone tries to call you when it’s not convenient to have your laptop with you”, my response is “what if people learn to not expect me to answer the phone at any time as they managed that in the 90s”. Seriously SMS or Matrix me if you want an instant response and if you want a long chat schedule it via SMS or Matrix.
Dell has some useful advice about how to use their laptops (and probably most laptops from recent times) in this regard [2]. You can’t close the lid before unplugging the power cable you have to unplug first and then close. You shouldn’t put a laptop in a sealed bag for travel either. This is a terrible situation, you can put a tablet in a bag and don’t need to take any special precautions when unplugging and laptops should work the same. The end result of what Microsoft, Dell, Intel, and others are doing will be good but they are making some silly design choices along the way! I blame Intel mostly for selling laptop CPUs with TDPs >40W!
I’ve learned a few things on an adventure this week, and I figure I should probably write them down.
First off, AWS throttles the number of DNS queries you can perform on a VPC. Apparently you’re limited to 1,024 packets for Elastic Network Interface (ENI). I am a little unclear on if the limit is per instance ENI, or the ENI on the VPC that is the DNS server. I am also unsure if that’s 1,024 request packets, or 1,024 total packets, but either way there is definitely a limit after which you will be throttled.
Secondly, AL2023 disables the systemd-resolved DNS caching behaviour, which means its pretty easy to hit that throttling limit. When you google for solutions you’ll find re:Post posts recommending dnsmasq, which is a perfectly fine piece of software but not really necessary if you already have systemd-resolved installed on your instance (as you do with AL2023).
First off you can verify that you’re not caching DNS with a command like this:
$ sudo resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: uplink
Link 2 (eth0)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
DNS Servers: 192.168.1.3
The “resolv.conf mode: uplink” here means the “stub resolver” in resolved is disabled. You can extra doubly confirm that by asking about caching statistics:
$ sudo systemd-resolve --statistics
DNSSEC supported by current servers: no
Transactions
Current Transactions: 0
Total Transactions: 0
Cache
Current Cache Size: 0
Cache Hits: 0
Cache Misses: 0
DNSSEC Verdicts
Secure: 0
Insecure: 0
Bogus: 0
Indeterminate: 0
Not much caching happening here!
The stub resolver is disabled in /usr/lib/systemd/resolved.conf.d/resolved-disable-stub-listener.conf which is very cunning because its not in /etc/ like you’d expect. To enable resolved with caching, you need to do the following:
You should probably also check that /etc/systemd/resolved.conf doesn’t have Cache=no or DNSStubListener=no. You can then test like this:
$ dig madebymikal.com
; <<>> DiG 9.18.28 <<>> madebymikal.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31719
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;madebymikal.com. IN A
;; ANSWER SECTION:
madebymikal.com. 10 IN A 192.99.17.214
;; Query time: 410 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Nov 11 07:25:42 UTC 2024
;; MSG SIZE rcvd: 60
$ sudo systemd-resolve --statistics
DNSSEC supported by current servers: no
Transactions
Current Transactions: 0
Total Transactions: 1
Cache
Current Cache Size: 1
Cache Hits: 0
Cache Misses: 1
DNSSEC Verdicts
Secure: 0
Insecure: 0
Bogus: 0
Indeterminate: 0
A quick final note: I lost more time than I care to admit realising that the /etc/resolv.conf symlink is not managed by systemd-resolved. You need to manually point that at /run/systemd/resolve/stub-resolv.conf if it does not already. You’re welcome.
As per previous conferences, eResearch Australasia 2024 in Melbourne had several hundred attendees from the scientific research community, research computing developers and operators, administrators and managers, and various vendors. The program gives a very good indication of the level of this conference and the reason that it has been such a success over the last fifteen years and more.
For the first time in the post-COVID environment, the conference was a face-to-face event and a very welcome opportunity to network with old colleagues in this environment as well as the opportunity to discover new people and new developments (as well as hearing some salacious gossip of changes in the eResearch landscape). Various presentations on artificial intelligence and machine learning (away from the current hype over Large Language Models) was prevalent, especially with regard to astounding developments in bioinformatics, of which the developers of Alphafold, the AI program that predicts protein structures, won the Nobel Prize in Chemistry is a prominent example.
My own presentation was on the development of the Spartan general-purpose high performance computing system at the University of Melbourne, which started as a small-scale and very experimental system operating on a shoestring budget, to one of the top systems in the world. I'm pleased to say that the talk seemed to be very well received to the crowded room (my head count was at least 70) and with a number of people asking for the slide deck afterwards; based on the comments from others, Spartan is extremely well regarded within the Australian eResearch community for these successes and for our extensive training program.
Delivers on the title. Interesting explanations of types of lakes, how they came to be and how they evolve. Great writing and lots of interesting information 4/5
A Veteran TV Writer and Showrunner writes about his career, the business and how to make it as a TV writer and possibly eventually a showrunner. Excellent 4/5
This tutorial illustrates Advanced Normalization Tools (ANTs) to do image registration in 3D using data from Brain/MINDS data portal and how to apply the transforms/inverse transforms from image registration using the Slurm Workload Manager.
"Advanced Normalization Tools (ANTs) is a C++ library available through the command line that computes high-dimensional mappings to capture the statistics of brain structure and function. It allows one to organize, visualize and statistically explore large biomedical image sets. Additionally, it integrates imaging modalities in space + time and works across species or organ systems with minimal customization."
This tutorial is derived from the tutorial at the Brain/MINDS data portal.
As with all Slurm jobs, start with the resource requests. Note that ANTs is memory intensive, so the request on our system is double what what would normally be allocated per CPU. Also, note that the application is multi-threaded.
Let's download some brains! And what marvellous brains they are!
"The dataset includes NIfTI files of MRI T2 ex-vivo data; reconstructed Nissl stained images of the same brain, registered to the shape of the MRI; brain region segmentation (with separate color lookup table); and gray, mid-cortical and white matter boundary segmentation" (BMA 2017 Ex Vivo (Brain Space 1)
"This atlas is composed of a population average ex-vivo MRI T2WI contrast mapped with the BMA 2017 Ex Vivo (published by Woodward et al. The Brain/MINDS 3D digital marmoset brain atlas). The population average MRI was constructed based on scans of 25 individual brains. The 25 brains were aligned with one another by iteratively applying linear and non-linear registration and averaging the transformation files until convergence. Data of individual brains were then resampled with an isotropic spatial resolution of 100×100×100µm3 and averaged across brain." BMA 2019 Ex Vivo (Brain Space 2)
From: https://dataportal.brainminds.jp/atlas-package-download-main-page/bma-20...
This can be added to the job submission script as follows:
Now register brain1 to brain2. The options are -d 3 (the dimensions of the brain, 3 in this case), -f the fixed image or the image we want to register to, in our case, this will be brain2., -m the moving image or the image we want to register, in our case, this will be brain1., -o the output prefix, in this example, the output files will all have the prefix "brain1_tobrain2"., -n the number of threads; add this to the end of the command. Thus we get the following command:
Now conduct an inverse transform (map label2 to brain1). The option -d is the same as above., -i represents the the volume we want to register, -r is the refences image (brain1)., -o is the output file., -t are the transforms, applied right to left in this case the inverse of the affine transforms, then the inverse displacement field., -n the interpolation method to be used, nothing to do with the number of threads (see previous command).
If you want to compare the result with the label map before registration, you can do it by loading sp2_label_512_v1.0.0.nii.gz and bma-1-mri.nii.gz with an application like 3DSlicer.
I believe buying a Kindle in 2024 is a bad idea, even if you only intend to use it for reading DRM-free locally stored ebooks. Basic functions such as organizing books into folders/collections are locked until the device is registered and with each system update the interface has became slower and more bloated.
Initially I purchased this device because Amazon book store isn’t too bad and it’s one of the easier way to buy Japanese books outside of Japan, but with all the anti-features Amazon add in I don’t think it’s still worth using.
Using a recent exploit and with this downgrader thread on the mobileread forum, I’m able to downgrade my paperwhite to an older 5.11.2 firmware which has a simpler interface while being much more responsive. If you already have a Kindle perhaps this is worth doing.
It’s possible to install alternative UI and custom OS to many Kindle models but they generally run slower than the default launcher. On the open hardware side Pine64 is making an e-ink tablet called the PineNote with an Rockchip RK3566 and 4G of RAM it should be fast enough to handle most documents/ebooks, but currently there is no usable Linux distribution for it.
A Star Trek parody from the POV of five ensigns who realise something is very strange on their ship. Plot moves steadily and the humour and action mostly work. 3/5
An account of the discover and lives of Neanderthals, Denisovans and others hominids who shared the earth with Homo sapiens in the last 300,000 years. 4/5
The basic Linux command for directory creation is mkdir $DIRNAME, with the most common options being -p to create parent directories and the handy verbose flag (-v) to print the directories to standard output as they are created. An array of subdirectories can also be created. e.g.,
$ mkdir -p -v examples/{dir1,dir2,dir3}
mkdir: created directory 'examples'
mkdir: created directory 'examples/dir1'
mkdir: created directory 'examples/dir2'
mkdir: created directory 'examples/dir3'
$ for d in examples/*/; do mkdir -v "${d}RHEL7"; done
mkdir: created directory 'examples/ABAQUS/RHEL7'
..
Then checked with the ever-veristile find command:
$ find . -type d -name RHEL7
./GROMACS/RHEL7
./R/RHEL7
..
A short script copies the 2015 and 2019 application files, which the author had the foresight to use the Year as a prefix naming convention. This makes use of an expanded variable, globbing, find, conditional tests, branching, redirection of standard error, and moving files.
A list of symlinks was also identified with find; find . -mindepth 1 -maxdepth 2 -type l -ls.
Since it opened in 2017 Parnell Station has been one of the least busy stations in Auckland. In the year to June 2019 there were just 168,000 boardings at the Station, ranking 36th out of 40 stations on the network.
While the suburb of Parnell is fairly high density and has a good mixture of retail, entertainment, office and residential is it under-served by the station.
Parnell Station’s main problem is that it is in a valley with the Auckland Domain on one side and a steep hill to Parnell Road on the other. The way up the hill is steep, indirect and is not suitable for people with mobility issues. The route to the museum is a rough walking track. There is a dedicated path to the Carlaw Park student village and business centre however.
The poor accessibility to the main Parnell Road shopping/business area and even worse access to the St Georges Bay Road business area have hurt the station’s usage. These problems have been written about previously on Greater Auckland, twice.
A wheelchair accessible underpass between the two platforms was added to the station in early 2024. This enabled safer and easier transfer between platforms and to access to the boardwalk to Carlaw Park. However the hill to Parnell Road is still a problem.
A Possible Solution – A Pedestrian Tunnel
My proposal is a pedestrian tunnel running from near the Parnell Station to the North-West under the main hill and emerging on St Georges Bay Road. Around the middle of the tunnel there would be elevators going up to Parnell Road. The tunnel would be around 550 metres long. The ends are at similar heights so the tunnel would be relatively flat while the central elevators would need to travel around 20 metres. The tunnel should be wide, well-lit and have security cameras etc to make people using it feel safe.
The elevators would be around 3 minutes walk from Parnell Station on 4-5 minutes from St Georges Bay Road. I’ve place the street level access to the elevators in Heard Park on the corner of Parnell Road and Ruskin Street (at the bend in the above map). Probably several elevators would be required for redundancy and since traffic will probably be bursty.
The St Georges Bay Road entrance could be at the bottom of Garfield Street. It would probably be easiest to take up some street/footpath space and run parallel to the road before turning South-West once significantly deep. There are several hundred jobs within a couple of minutes walk of this entrance. There is also a Saturday Market nearby.
Overall the project should be only moderately expensive to build and improve the catchment and value of Parnell Station as well as linking three parts of Parnell better together.
This Blog is about a variety of topics that I’m interested in. My top posts are listed below. I also do regular posts on Audiobooks I’ve listened to and notes from conferences I attend.
Converting between filesystems can be fraught with difficulties, especially if one is dealing with a proprietary filesystem. One such system is APFS, the Apple File System, designed by Apple for its devices, introduced originally for macOS Sierra in 2017 and later, iOS 10.3, tvOS 10.2, watchOS 3.2, and all versions of iPadOS, and designed to replace HFS Plus. The question here is how does one access APFS on Linux when a kind individual has provided you, for example, a USB device that has been written with this filesystem. The following are some brief notes on how to do this with Ubuntu Linux 20.04.6 LTS.
The first step is to install libfsapfs-utils, which is a library to access the Apple File System (APFS). One can install this from source if desired for performance or developmental reasons. In this case, we'll just install the package.
$ sudo apt install libfsapfs-utils
The next step is to go into sudo mode and run fdisk, a command to manipulate disk partition table. Understandably, such a command requires privileged access and responsibility. But our purpose here is just to list what devices are present to find the device file of our drive.
$ sudo -s
# fdisk -l
..
Device Start End Sectors Size Type
/dev/sda1 40 409639 409600 200M EFI System
/dev/sda2 409640 15769559 15359920 7.3G unknown
There it is, /dev/sda2. Now we want to mount it, using fsapfsmount, which mounts an Apple File System (APFS) volume. The option -f is used to specify the specific file system ("1" is typical), then the dev file and finally where it is being mounted to (in this case, /mnt.
# fsapfsmount -f 1 /dev/sda2 /mnt.
From here, it's a simple matter of copying the files over to the directory of choice, in this case the current working directory.
# cp -r /mnt/ACFSPres/ .
After that unmount the device.
# fusermount -u /mnt
Change ownership of the file and directories (because they are currently owned by root) to the user who needs access to them.
Hello amazing teachers! Are you looking for a fun and engaging way to bring history to life for your students? Meet Sabaton, a Swedish heavy metal band known for their powerful songs about historical events. While heavy metal might not be the first thing that comes to mind for a primary school setting, Sabaton’s music […]
Neuro-divergence, encompassing conditions such as autism spectrum, ADHD, and sensory processing, can profoundly influence how individuals perceive and respond to their bodily signals.
While neurotypical individuals generally recognise and respond to hunger, thirst, and satiety cues with relative ease, neuro-divergent individuals often face unique challenges in this area. Understanding these challenges is crucial for fostering empathy and supporting effective strategies for well-being.
This article is written so it is directly readable and useful (in terms of providing action items) for people in your immediate surroundings, but naturally it can be directly applied by neuro-spicy people themselves!
Hunger and Thirst Cues
For many neuro-divergent people, recognising hunger and thirst cues can be a complex task. These signals, which manifest as subtle physiological changes, might not be as easily identifiable or may be misinterpreted.
For instance, someone on the spectrum might not feel hunger as a straightforward sensation in the stomach but instead experience it as irritability or a headache. Similarly, those with ADHD may become so hyper-focused on tasks that they overlook or ignore feelings of hunger and thirst entirely.
Sensory Processing and Signal Translation
Sensory processing issues can further complicate the interpretation of bodily signals. Neuro-divergent individuals often experience heightened or diminished sensory perception.
This variability means that sensations like hunger pangs or a dry mouth might be either too intense to ignore or too faint to detect. The result is a disconnection from the body’s natural cues, leading to irregular eating and drinking habits.
Satiety and Fullness
Recognising satiety and fullness presents another layer of difficulty. For neuro-divergent individuals, the brain-gut communication pathway might not function in a typical manner.
This miscommunication can lead to difficulties in knowing when to stop eating, either due to a delayed recognition of fullness or because the sensory experience of eating (such as the textures and flavours of food) becomes a primary focus rather than the physiological need.
Emotional and Cognitive Influences
Emotions and cognitive patterns also play significant roles. Anxiety, a common experience among neuro-divergent individuals, can mask hunger or thirst cues, making it harder to recognise and respond appropriately.
Additionally, rigid thinking patterns or routines, often seen with autism spectrum, might dictate eating schedules and behaviours more than actual bodily needs.
Strategies for Support
Understanding these challenges opens the door to effective strategies and support mechanisms:
Routine and structure: Establishing regular eating and drinking schedules can help bypass the need to rely on internal cues. Setting alarms or reminders can ensure that meals and hydration are not overlooked.
Mindful eating practices: Encouraging mindful eating, where individuals pay close attention to the sensory experiences of eating and drinking, can help in recognising subtle signals of hunger and fullness.
Sensory-friendly options: Offering foods and beverages that align with an individual’s sensory preferences can make the experience of eating and drinking more enjoyable and less overwhelming. This is a really important aspect!
Environmental adjustments: Creating a calm, distraction-free eating environment can help individuals focus more on their bodily cues rather than external stimuli.
Education and awareness: Educating neuro-divergent individuals about the importance of regular nourishment and hydration, and how their unique experiences might affect this, can empower them to develop healthier habits. This is, of course, more a longer term strategy.
Understanding the complex interplay between neuro-divergence and bodily signals underscores the importance of personalised approaches and compassionate support.
By acknowledging and addressing these challenges, we can help neurodivergent individuals achieve better health and well-being!
(this post was created using some information from ChatGPT in addition to our own research)
Several months ago, YouTube began "a global effort" to prevent users from blocking advertisements. This process included allowing users with an adblocker, once detected, a few videos, then a warning, and then outright prevention. There was an implicit suggestion that one could receive the desired ad-free service from a Premium subscription. Methods employed by YouTube to implement these blocks include embedding advertisements in the video itself, serving advertisements from the same domain as the video, or using browser fingerprinting to detect ad-blocking extensions.
Since then there have been a variety of methods employed by users to bypass the prevention of ad-blockers. For a while, use of uBlockOrigin was recommended as an ad-blocker. Fingerprinting could be circumvented by extensions like Canvas Fingerprint Defender. Others recommended disabling javascript on YouTube, using alternative browsers (e.g., Firefox), or even using Discord. A specific ad-block extension even exists for YouTube. As useful as these are they are likely to face further restrictions according to the Manifest3 development in Chrome.
From YouTube's perspective, ad-blockers reduce their income and, by extension, the income of content creators or providers. As much as this has a kernel of truth, the provision of advertisements on YouTube is so bad that even if an advertising algorithm manages to match well with a viewer's watch history, it is likely to put people off. Advertisements interspersed throughout a video, lengthy, unskippable advertisements and advertisements of questionable taste. There is, obviously, a significant difference between including skippable advertisements at the beginning and end of a video to what is being provided. For what it's worth, YouTube's adblocker detection is believed to break EU privacy laws, and the use of ad-blockers is actually recommended by the FBI to prevent fraud.
One excellent tool that works around these restrictions and provides a backup of the video for asynchronous viewing is yt-dlp with binaries available for Linux, MacOS, MS-Windows, and source code. As a general video downloader, it operates on thousands of sites, has extensive documentation for the enormous variety of options, and actively seeks improvements from outside contributors. When someone suggests a YouTube video to watch, one can simply download it from the command line, or even script it and run it in batch, in a manner convenient to the user. This is the way that audio and video content should be provided.
I’ve been daily driving the PinePhone Pro with swmo for some times now, it’s not perfect but I still find it be one of the most enjoyable devices I’ve used. Probably only behind BlackBerry Q30/Passport which also has a decent keyboard and runs an unfortunately locked-down version of QNX. For me it’s less like a phone and more like a portable terminal for times when using a full size laptop is uncomfortable or impractical, and with the keyboard it’s possible to write lengthy articles on the go.
This isn’t the only portable Linux terminal I owned, before this I used a Nokia N900 which till this day is still being maintained by the maemo leste team, but the shutdown of 3G network in where I live made it significantly less usable as a phone and since it doesn’t have a proper USB port I cannot use it as a serial console easily.
The overall experience on the PPP now as of 2024 isn’t as polished as that of the BlackBerry Passport, and adhoc hacks are often required to get the system going, however as the ecosystem progress the experience will also improve with new revisions of hardware and better software.
I use sxmo and swmo interchangeably in this post, they refer to the same framework running under Xorg and wayland, the experience is pretty much the same.
The default scaling of sxmo doesn’t allow the many desktop applications to display their window properly, especially when such application is written under the assumption of being used on a larger screen. To set the scaling to something more reasonable, add the following line to ~/.config/sxmo/sway:
exec wlr-randr --output DSI-1 --scale 1.3
When using swmo environment initialization is mostly done in ~/.config/sxmo/sway and ~/.config/sxmo/xinit is not used.
Scaling for Firefox needs to be adjusted separately by first enabling compact UI and then set settings -> default zoom to your liking.
To rotate Linux framebuffer, add fbcon=rotate:1 to the U_BOOT_PARAMETERS line in /usr/share/u-boot-menu/conf.d/mobian.conf and run u-boot-update to apply.
I also removed quiet splash from U_BOOT_PARAMETERS to disable polymouth animation as it isn’t very useful on landscape mode.
Swmo doesn’t come with a secure screen locker. but swaylock works fine and it can be bind to a key combination with sway’s configure file. To save some battery life, systemctl suspend can be triggered after swaylock, to bind that to Meta+L:
The default keymap for the PinePhone keyboard is missing a few useful keys, namely F11/F12 and PgUp/PgDown. To create those keys I used evremap(1) to make a custom keymap. Unfortunately the Fn key cannot be mapped as a layer switcher easily, so I opted to remap AltG and Esc as my primary modifiers.
I’m working on a Debian package for evremap and it will be made available for Debian/Mobian soon.
Incus is a container/VM manager for Linux, it’s available for Debian from bookworm/backports and is a fork of LXD by the original maintainers behind LXD. It works well for creating isolated and unprivileged containers. I have multiple incus containers on the PinePhone Pro for Debian packaging and it’s a better experience than manually creating and managing chroots. In case there is a need for running another container inside an unprivileged incus container, it’s possible to configure incus to intercept certain safe system calls and forward them to the host, removing the need for using privileged container.
Sway is decently usable in convergence mode, in which the phone is connected to a dock that outputs to an external display and keyboard and mouse are used as primary controls instead of the touchscreen.
This isn’t surprising since sway always had great support for multi monitor, however another often overlooked convergence mode is with waypipe. In this mode another Linux machine (e.g. a laptop) can be used to interact with applications running on the phone and the phone will be kept charged by the laptop. This is particularly useful for debugging phone applications or for accessing resources on the phone (e.g. sending and receiving sms). One thing missing in this setup is that graphic applications cannot roam between the phone and the external system (e.g. move running applications from one machine to another). Xpra does this for Xorg but doesn’t work with wayland.
Due to the simplicity of the swmo environment it’s not too difficult to get the system running with SELinux in Enforcing mode, and I encourage everyone reading this to try it. If running debian/mobian a good starting point is the SELinux/Setup page on Debian wiki.
Note: selinux-activate won’t add the required security=selinux kernel option to u-boot (it only deals with GRUB) so you have to manually add it to the U_BOOT_PARAMETERS line in /usr/share/u-boot-menu/conf.d/mobian.conf and run u-boot-update after selinux-activate. The file labeling process can easily take 10 minutes and the progress won’t be displayed on the framebuffer (only visible via the serial console).
SELinux along with the reference policy aren’t enough for building a reasonably secure interactive system, but let’s leave that for a future post.
The April 2024 meeting is the first meeting after Everything Open 2024 and the discussions are primarily around talks and lectures people found interesting during the conference, including the n3n VPN and the challenges of running personal email server. At the start of the meeting Yifei Zhan demonstrated a development build of Maemo Leste, an active Maemo-like operating system running on a PinePhone Pro.
Other topics discussed including modern network protocol ossification, SIP and possible free and open source VoLTE implementation.
The PinePhone keyboard contains a battery, which will be used to charge the PinePhone when the keyboard is attached. Althrough there are existing warnings on the pine64 wiki which sums up to ‘don’t charge or connect anything to your pinephone’s type C interface when the keyboard is attached’, my two pinephone keyboards still managed to fry themselves, with one releasing stinky magic smoke and the other melting the plastic around the pogo pins on the pinephone backplate.
This all happened while the pinephone’s type C interface being physically block when attached to the keyboard. In the first case, the keyboard’s controller PCB blew up when I tried to charge it, in the latter case the keyboard somehow overheated and melted the plastic near the pogo interface on the phone side.
Pine64 provided me a free replacement keyboard after multiple emails back and forth, but according to Pine64 there will be no more free replacement for me in future, and there is no guarantee that this will not happen to my replacement keyboard.
The cost for replacing all the fried parts with spare parts from the Pine64 store is about 40 USD (pogo pins + backplate + keyboard PCB), and considering this problem is likely to happen again, I don’t think purchasing those parts is a wise decision.
Both the melting plastic and the magic smoke originated from the fact that charges are constantly shuffled around when the keyboard is attached to the pinephone, and since the keyboard can function independently from the battery, we can disconnect and remove the battery from the keyboard case to make sure it will not blow up again. After such precedure the keyboard will keep functioning althrough the keyboard-attached pinephone might flip over much more easily due to the lightened keyboard base. Be aware that the keyboard isn’t designed to be taken apart, and doing so will likely result in scratches on the case. As for me, I’d much rather have a keyboard case without builtin battery than have something that can overheat or blow up.
On Friday, 1st March, it will be exactly one year since I walked into
Zen Motorcycles, signed the
paperwork, and got on my brand new Energica Experia electric motorbike. I
then rode it back to Canberra, stopping at two places to charge along the
way, but that was more in the nature of making sure - it could have done the
trip on one better-chosen charging stop.
I got a call yesterday from a guy who had looked at the Experia Bruce has at
Zen and was considering buying one. I talked with him for about three
quarters of an hour, going through my experience, and to sum it up simply I
can just say: this is a fantastic motorbike.
Firstly, it handles exactly like a standard motorbike - it handles almost
exactly like my previous Triumph Tiger Sport 1050. But it is so much easier
to ride. You twist the throttle and you go. You wind it back and you slow
down. If you want to, the bike will happily do nought to 100km/hr in under
four seconds. But it will also happily and smoothly glide along in traffic.
It says "you name the speed, I'm happy to go". It's not temperamental or
impatient; it has no weird points where the throttle suddenly gets an extra
boost or where the engine braking suddenly drops off. It is simple to ride.
As an aside, this makes it perfect for lane filtering. On my previous bike
this would always be tinged with a frisson of danger - I had to rev it and
ease the clutch in with a fair bit of power so I didn't accidentally stall it,
but that always took some time. Now, I simply twist the throttle and I am
ahead of the traffic - no danger of stalling, no delay in the clutch gripping,
just power. It is much safer in that scenario.
I haven't done a lot of touring yet, but I've ridden up to Gosford once and
up to Sydney several times. This is where Energica really is ahead of pretty
much every other electric motorbike on the market now - they do DC fast
charging. And by 'fast charger' here I mean anything from 50KW up; the
Energica can only take 25KW maximum anyway :-) But this basically means I
have to structure any stops we do around where I can charge up - no more
stopping in at the local pub or a cafe on a whim for morning tea. That has
to either offer DC fast charging or I'm moving on - the 3KW onboard AC
charger means a 22KW AC charger is useless to me. In the hour or two we
might stop for lunch I'd only get another 60 - 80 kilometres more range on
AC; on DC I would be done in less than an hour.
But OTOH my experience so far is that structuring those breaks around where I
can charge up is relatively easy. Most riders will furiously nod when I say
that I can't sit in the seat for more than two hours before I really need to
stretch the legs and massage the bum :-) So if that break is at a DC charger,
no problems. I can stop at Sutton Forest or Pheasant's Nest or even
Campbelltown and, in the time it takes for me to go to the toilet and have a
bit of a coffee and snack break, the bike is basically charged and ready to
go again.
The lesson I've learned, though, is to always give it that bit longer and
charge as much as I can up to 80%. It's tempting sometimes when I'm standing
around in a car park watching the bike charge to move on and charge up a bit
more at the next stop. The problem is that, with chargers still relatively
rare and there often only being one or two at each site, a single charger
not working can mean another fifty or even a hundred kilometres more riding.
That's a quarter to half my range, so I cannot afford to risk that. Charge
up and take a good book (and a spare set of headphones).
In the future, of course, when there's a bank of a dozen DC fast chargers in
every town, this won't be a problem. Charger anxiety only exists because
they are still relatively rare. When charging is easy to find and always
available, and there are electric forecourts like the UK is starting to get,
charging stops will be easy and will fit in with my riding.
Anyway.
Other advantages of the Experia:
You can get it with a complete set of Givi MonoKey top box and panniers. This
means you can buy your own much nicer and more streamlined top box and it fits
right on.
Charging at home takes about six hours, so it's easy to do overnight. The
Experia comes with an EVSE so you don't need any special charger at home. And
really, since the onboard AC charger can only accept 3KW, there's hardly any
point in spending much money on a home charger for the Experia.
Minor niggles:
The seat is a bit hard. I'm considering getting the
EONE
Canyon saddle, although I also just need to try to work out how to get
underneath the seat to see if I can fit my existing sheepskin seat cover.
There are a few occasional glitches in the display in certain rare situations.
I've mentioned them to Energica, hopefully they'll be addressed.
Supercomputing Asia 2024 was held in Sydney from the 19th to 23rd of February with over 1,000 attendees, most of whom were from Australia, the United States, Singapore, Japan, Thailand, and Aotearoa New Zealand, with a notable exception from the conference was China given their importance to both supercomputing and Asia, and one speaker noted wryly that "Australia is now apparently part of Asia". The program consisted of plenary sessions in the morning and multiple streams in the afternoon of each day. My attendance was at the IBM Storage Scale User Group for the entirety of the first day, the HPC Leadership Forum on the second, Skills and Training on the third, and the Accelerated Data Analytics and Computing Institute (ADAC) symposium on the fourth. The Storage Scale User Group was useful for a roadmap of their systems (e.g., IBM Storage Scale System 6000, Fusion HCI) and case studies. The Leadership Forum and the ADAC symposium both gave an overview of some of the major systems in the region, which included the two largest systems, Frontier (no 1), Aurora (no 2), along with Fugaku (no 4).
Of note from Fugaku was a Hyperion study on their macroeconomic return on investment for their HPC which was between $63 to $91 per dollar invested, following the 2013 IDN study of HPC in general indicating $44 per dollar invested. The larger figure is explained because of the tighter integration with national objectives in the peak system. Also of note, a concurring with a report written in September 2022 ("Microprocessor Trend Usage in HPC Systems for 2022-2023") was the rise of systems using AMD CPUs and the ubiquity of CPU/GPU heterogeneity. Thailand's Supercomputing Centre of note, rising from a relatively small system to one with 31744 AMD CPUs, 704 A100s, and no 94 in the top500 with 50% of their operating revenue now coming from fee-for-service from "national interest" private industries. In Australia, there is the leadership from NCI in developing the Indo-Pacific Exascale Consortium, modelled after the EuroHPC Joint Undertaking effort.
About 50 people attended the talk I gave at SCAsia 2024 on "HPC Certification Forum & Skill Tree: An Update". There was quite an enthusiastic discussion that followed with several questions about micro-credentials, the potential use of OpenBadge as part of the certification process, and strong interest from several other HPC centres (UWA, CSIRO, NeSI) and Intersect in participating eco-system approach about using the skill tree approach for training content and contributing back. The potential of this sort of collaboration within Australia at the very least will be extremely valuable in improving the HPC on-boarding process for researchers. The talk also dovetailed with a poster presentation, "HPC Training Generates HPC Results", which pointed out longitudinal correlations between the two in terms of training sessions, computer hours, and job completion.
Running topics of note throughout the conference and especially in the plenary sessions (a nice quirk was that the voice of Siri, Karen Jacobsen, was the MC for these sessions), was a focus on AI/machine learning/LLMs and quantum computing. The former topic especially noted the advantages of GPUS which bodes well for our own large GPU partition. Differentiation must be considered between quantum computing and quantum computers; as a recent Spartan-citing paper pointed out quantum algorithms on "classical" computers (e.g., HPC) are preferable to quantum computers which are very much still in the experimental phase. To differentiate, quantum computing is any method to generate quantum effects whereby qubit states can exist in superposition (0,1, both) rather than binary states (0,1). The typical system to do quantum computing, or at least simulate it, is usually HPC. In contrast, a quantum computer uses a system that directly uses a quantum system. For example, GENCI in France uses a photonic computer, LRZ in Germany uses superconducting qubits, PSNC in Poland uses trapped ions, etc.
Opportunities to speak with vendors is always important and in particular longer discussions were held with Dell with their roadmap, DDN on their new filesystem, and Altair's HPCWorks application (which, at the moment, only operates with PBSPro). Notably, many vendors continue to make a pitch in favour of monopolisation under the guise of convenience ("we'll do everything for you") rather than interoperability. Special thanks are given to Xenon Systems for an evening hosted at L'Aqua on Cockle Bay Wharf.
Overall, attendance and participation at the conference were extremely valuable for direct knowledge improvements in storage, useful collaborations with other centres for HPC training, awareness of vendor products, system developments in Asia and US, and developing an understanding of the overall direction of AI/LLM and quantum computing in HPC environments.
Way back in the distant past, when the Apple ][ and the Commodore 64 were king, you could read the manual for a microprocessor and see how many CPU cycles each instruction took, and then do the math as to how long a sequence of instructions would take to execute. This cycle counting was used pretty effectively to do really neat things such as how you’d get anything on the screen from an Atari 2600. Modern CPUs are… complex. They can do several things at once, in a different order than what you wrote them in, and have an interesting arrangement of shared resources to allocate.
So, unlike with simpler hardware, if you have a sequence of instructions for a modern processor, it’s going to be pretty hard to work out how many cycles that could take by hand, and it’s going to differ for each micro-architecture available for the instruction set.
When designing a microprocessor, simulating what a series of existing instructions will take to execute compared to the previous generation of microprocessor is pretty important. The aim should be for it to take less time or energy or some other metric that means your new processor is better than the old one. It can be okay if processor generation to generation some sequence of instructions take more cycles, if your cycles are more frequent, or power efficient, or other positive metric you’re designing for.
Programmers may want this simulation too, as some code paths get rather performance critical for certain applications. Open Source tools for this aren’t as prolific as I’d like, but there is llvm-mca which I (relatively) recently learned about.
llvm-mca is a performance analysis tool that uses information available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific CPU.
So, when looking at an issue in the IPv6 address and connection hashing code in Linux last year, and being quite conscious of modern systems dealing with a LOT of network packets, and thus this can be quite CPU usage sensitive, I wanted to make sure that my suggested changes weren’t going to have a large impact on performance – across the variety of CPU generations in use.
There’s two ways to do this: run everything, throw a lot of packets at something, and measure it. That can be a long dev cycle, and sometimes just annoying to get going. It can be a lot quicker to simulate the small section of code in question and do some analysis of it before going through the trouble of spinning up multiple test environments to prove it in the real world.
So, enter llvm-mca and the ability to try and quickly evaluate possible changes before testing them. Seeing as the code in question was nicely self contained, I could easily get this to a point where I could easily get gcc (or llvm) to spit out assembler for it separately from the kernel tree. My preference was for gcc as that’s what most distros end up compiling Linux with, including the Linux distribution that’s my day job (Amazon Linux).
In order to share the results of the experiments as part of the discussion on where the code changes should end up, I published the code and results in a github project as things got way too large to throw on a mailing list post and retain sanity.
I used a container so that I could easily run it in a repeatable isolated environment, as well as have others reproduce my results if needed. Different compiler versions and optimization levels will very much produce different sequences of instructions, and thus possibly quite different results. This delta in compiler optimization levels is partially why the numbers don’t quite match on some of the mailing list messages, although the delta of the various options was all the same. The other reason is learning how to better use llvm-mca to isolate down the exact sequence of instructions I was caring about (and not including things like the guesswork that llvm-mca has to do for branches).
One thing I learned along the way is how to better use llvm-mca to get the results that I was looking for. One trick is to very much avoid branches, as that’s going to be near complete guesswork as there’s not a simulation of the branch predictor (at least in the version I was using.
The big thing I wanted to prove: is doing the extra work having a small or large impact on number of elapsed cycles. The answer was that doing a bunch of extra “work” was essentially near free. The CPU core could execute enough things in parallel that the incremental cost of doing extra work just… wasn’t relevant.
This helped getting a patch deployed without impact to performance, as well as get a patch upstream, fixing an issue that was partially fixed 10 years prior, and had existed since day 1 of the Linux IPv6 code.
Naturally, this wasn’t a solo effort, and that’s one of the joys of working with a bunch of smart people – both at the same company I work for, and in the broader open source community. It’s always humbling when you’re looking at code outside your usual area of expertise that was written (and then modified) by Really Smart People, and you’re then trying to fix a problem in it, while trying to learn all the implications of changing that bit of code.
Anyway, check out llvm-mca for your next adventure into premature optimization, as if you’re going to get started with evil, you may as well start with what’s at the root of all of it.
At this rate, there is no real blogging here, regardless of the lofty plans to starting writing more. Stats update from Hello 2023:
219 days on the road (less than 2022! -37, over a month, shocking), 376,961km travelled, 44 cities, 17 countries.
Can’t say why it was less, because it felt like I spent a long time away…
In Kuala Lumpur, I purchased a flat (just in time to see Malaysia go down), and I swapped cars (had a good 15 year run). I co-founded a company, and I think there is a lot more to come.
2024 is shaping up to be exciting, busy, and a year, where one must just do.
Since late in 2007 I have been involved in the field of high performance computing. Initially, this was at the Victorian Partnership for Advanced Computing, but just before that organisation closed its doors in December 2015 I accepted a similar role at the University of Melbourne. The end of the year provides a reason for reflection, an annual report if one likes, and whilst activities not related to my vocation and profession will be dealt with in a subsequent entry, the opportunity is taken here to review workplace activities and in particular, changes in the environment for the University's general HPC system, Spartan. Spartan now has 6159 accounts across 2109 projects in diverse disciplines in the life sciences, engineering, economics, mathematics, and more and has been cited in 62 papers in the past year.
Some of those papers led to presentations to the Research Computing Services (RCS) team through the Cultural Working Group (CWG), which I have chaired for the past two years and held responsibility for organising these talks. In total six presentations were held this year, with a personal favourite on the use of AI algorithms, a supercomputer (Spartan), and robotics to sort plastic waste from two researchers at the Department of Infrastructure Engineering. The CWG was formed in 2020 following recognition from a staff survey that not all was well in RCS in terms of staff awareness of the group's objective, work between the different groups within the RCS, transparency in decision-making, involvement, and influence in decisions, career-progression opportunities, and job security. The staff-led CWG (with one management representative) made a concerted effort across those targetted areas and, following a survey in the middle of this year, substantial improvements were found in every criterion. At the end of this year, just after the last tech/researcher presentation, it brought great pleasure to say that whilst operations would continue, the group had succeeded in achieving its objectives and could close down as a formal body and as a successful project.
A very large part of my role at the University consists of training various postgraduate and postdoctoral researchers on how to use the system. This year included some 24 days of workshops involving close to 500 participants, roughly on par with other years and deliberately pulling back a bit from the first year of COVID, when over 40 of such workshops were conducted. Of particular note was early in the year a review was conducted of usage from those who had received training the previous year, resulting in the very surprising metric that at least 54.14% of cluster utilisation in 2022 was conducted by users after they had received training. I have always emphasized how important HPC training is but it was astounding to see such a metric as proof. As another form of training this year I continued with my regular activity as a guest lecturer and tutor for the master's level course Cluster and Cloud Computing. My role in this, previously just a single lecture, has now been extended to six lectures and workshops and is likely to expand in 2024. I must also mention here a presentation on RCS services to the Quantitative and Applied Ecology Research Group, with a future paper in development from that body on software citations.
Another major part of my role is scientific software optimisation and installation. Apart from the usual work in this field this year had the bonus of Spartan receiving its first major operating system upgrade since it was first turned on in 2015. Changing the underlying major release of the operating system (and indeed, jumping from RHEL v7 to v9) required existing software to be recompiled. In a one-month period, working with demonic fury, I was primarily responsible for around 500 software builds and an expansion in job submission examples. At the same time, Spartan also finally had the opportunity to run the LINPACK tests to be recognised as one of the world's supercomputers. It was an award that was long overdue (we've had sufficient performance to be on that list for years) and even then the certificate was for only part of the entire system.
Other activities included establishing the Spartan HPC Champions group among power-users of the system who can provide training advice to other members of their research teams, and continued involvement as a Board member of the international HPC Certification Forum and as an irregular contributor to the EasyBuild code repository. I have no doubt that these and other activities will all continue in 2024, however, there will be an additional role as well, following a necessary and considered restructure of RCS, I have found myself as the recipient of a small promotion in role and responsibility. It will be a position I will take with the appropriate seriousness; after all, supercomputing is one of those activities that has made a massive change to improving the world and will continue to do so. For the technical staff, it can be challenging and rewarding as they provide the researchers the tools to make great discoveries and inventions. But those staff also need to be in an environment where they feel secure and can flourish - and that means listening to their technical advice, as they actually do know best for such matters. This will be certainly the most significant challenge in the coming year.
It’s time for a review of the second year of operation of our Redflow ZCell battery and Victron Energy inverter/charger system. To understand what follows it will help to read the earlier posts in this series:
Go With The Flow (what all the pieces are, what they do, some teething problems)
TANSTAAFL (review/analysis of the first year of operation)
In case ~12,000 words of background reading seem daunting, I’ll try to summarise the most important details here:
We have a 5.94kW solar array hooked up to a Victron MPPT RS solar charge controller, two Victron 5kW Multi-Plus II inverter/chargers, a Victron Cerbo GX console, and a single 10kWh Redflow ZCell battery. It works really well. We’re using most of our generated power locally, and it’s enabled us to blissfully coast through several grid power outages and various other minor glitches. The Victron gear and the ZCell were installed by Lifestyle Electrical Services.
Redflow batteries are excellent because you can 100% cycle them every day, and they aren’t a giant lump of lithium strapped to your house that’s impossible to put out if it bursts into flames. The catch is that they need to undergo periodic maintenance where they are completely discharged for a few hours at least every three days. If you have more than one, that’s fine because the maintenance cycles interleave (it’s all automatic). If you only have one, you can’t survive grid outages if you’re in a maintenance period, and you can’t ordinarily use the Cerbo’s Minimum State of Charge (MinSoC) setting to perpetually keep a small charge in the battery in case of emergencies. As we still only have one battery, I’ve spent a fair bit of time experimenting to mitigate this as much as I can.
The system itself requires a certain amount of power to run. Think of the pumps and fans in the battery, and the power used directly by the inverters and the console. On top of that a certain amount of power is simply lost to AC/DC conversion and charge/discharge inefficiencies. That’s power that comes into your house from the grid and from the sun that your loads, i.e. the things you care about running, don’t get to use. This is true of all solar PV and battery storage systems to a greater or lesser degree, but it’s not something that people always think about.
With the background out of the way we can get on to the fun stuff, including a roof replacement, an unexpected fault after a power outage followed by some mains switchboard rewiring, a small electrolyte leak, further hackery to keep a bit of charge in the battery most of the time, and finally some numbers.
The big job we did this year was replacing our concrete tile roof with colorbond steel. When we bought the house – which is in a rural area and thus a bushfire risk – we thought: “concrete brick exterior, concrete tile roof – sweet, that’s not flammable”. Unfortunately it turns out that while a tile roof works just fine to keep water out, it won’t keep embers out. There’s a gadzillion little gaps where the tiles overlap each other, and in an ember attack, embers will get up in there and ignite the fantastic amount of dust and other stuff that’s accumulated inside the ceiling over several decades, and then your house will burn down. This could be avoided by installing roof blanket insulation under the tiles, but in order to do that you have to first remove all the tiles and put them down somewhere without breaking them, then later put them all back on again. It’s a lot of work. Alternately, you can just rip them all off and replace the whole lot with nice new steel, with roof blanket insulation underneath.
Of course, you need good weather to replace a roof, and you need to take your solar panels down while it’s happening. This meant we had twenty-two solar panels stacked on our back porch for three weeks of prime PV time from February 17 – March 9, 2023, which I suspect lost us a good 500kW of power generation. Also, the roof job meant we didn’t have the budget to get a second ZCell this year – for the cost of the roof replacement, we could have had three new ZCells installed – but as my wife rightly pointed out, all the battery storage in the world won’t do you any good if your house burns down.
We had at least five grid power outages during the year. A few were brief, the grid being down for only a couple of minutes, but there were two longer ones in September (one for 30 minutes, one for about an hour and half). We got through the long ones just fine with either the sun high in the sky, or charge in the battery, or both. One of the earlier short outages though uncovered a problem. On the morning of May 30, my wife woke up to discover there was no power, and thus no running water. Not a good thing to wake up to. This happened while I was away, because of course something like this would happen while I was away. It turns out there had been a grid outage at about 02:10, then the grid power had come back, but our system had not. The Multis ended up in some sort of fault state and were refusing to power our loads. On the console was an alarm message: “#8 – Ground relay test failed”.
That doesn’t look good.
Note the times in the console messages are about 08:00. I confirmed via the logs from the VRM portal that the grid really did go out some time between 02:10 and 02:15, but after that there was nothing in the logs until 07:59, which is when my wife used the manual changeover switch to shift all our loads back to direct grid power, bypassing the Victron kit. That brought our internet connection back, along with the running water. I contacted Murray Roberts from Lifestyle Electrical and Simon Hackett for assistance, Murray logged in remotely and reset the Multis, my wife flicked the changeover switch back and everything was fine. But the question remained, what had gone wrong?
The ground relay in the Multis is there to connect neutral to ground when the grid fails. Neutral and ground are already physically connected on the grid (AC input) side of the Multis in the main switchboard, but when the grid power goes out, the Multis disconnect their inputs, which means the loads on the AC output side no longer have that fixed connection from neutral to ground. The ground relay activates in this case to provide that connection, which is necessary for correct operation of the safety switches on the power circuits in the house.
The ground relay is tested automatically by the Multis. Looking up Error 8 – Ground relay test failed on Victron’s web site indicated that either the ground relay really was faulty, or possibly there was a wiring fault or an issue with one of the loads in our house. So I did some testing. First, with the battery at 50% State of Charge (SoC), I did the following:
Disconnected all loads (i.e. flipped the breaker on the output side of the Multis)
Killed the mains (i.e. flipped the breaker on the input side of the Multis)
Verified the system switched to inverting mode (i.e. running off the battery)
Restored mains power
Verified there was no error
This demonstrated that the ground relay and the Multis in general were fine. Had there been a problem at that level we would have seen an error when I restored mains power. I then reconnected the loads and repeated steps 2-5 above. Again, there was no error which indicated the problem wasn’t due to a wiring defect or short in any of the power or lighting circuits. I also re-tested with the heater on and the water pump running just in case there may have been an issue specifically with either of those devices. Again, there was no error.
The only difference between my test above and the power outage in the middle of the night was that in the middle of the night there was no charge in the battery (it was right after a maintenance cycle) and no power from the sun. So in the evening I turned off the DC isolators for the PV and deactivated my overnight scheduled grid charge so there’d be no backup power of any form in the morning. Then I repeated the test:
Disconnected all loads
Killed the mains.
Checked the console which showed the system as “off”, as opposed to “inverting”, as there was no battery power or solar generation
Restored mains power
Shortly thereafter, I got the ground relay test failed error
The underlying detailed error message was “PE2 Closed”, which meant that it was seeing the relay as closed when it’s meant to be open. Our best guess is that we’d somehow hit an edge case in the Multi’s ground relay test, where they maybe tried to switch to inverting mode and activated the ground relay, then just died in that state because there was no backup power, and got confused when mains power returned. I got things running again by simply power cycling the Multis.
So it kinda wasn’t a big deal, except that if the grid went out briefly with no backup power, our loads would remain without power until one of us manually reset the system. This was arguably worse than not having the system at all, especially if it happened in the middle of the night, or when we were away from home. The fact that we didn’t hit this problem in the first year of operation is a testament to how unlikely this event is, but the fact that it could happen at all remained a problem.
One fix would have been to get a second battery, because then we’d be able to keep at least a tiny bit of backup power at all times regardless of maintenance cycles, but we’re not there yet. Happily, Simon found another fix, which was to physically connect the neutral together between the AC input and AC output sides of the Multis, then reconfigure them to use the grid code “AS4777.2:2015 AC Neutral Path externally joined”. That physical link means the load (output) side picks up the ground connection from the grid (input) side in the swichboard, and changing the grid code setting in the Multis disables the ground relay and thus the test which isn’t necessary anymore.
Murray needed to come out anyway to replace the carbon sock in the ZCell (a small item of annual maintenance) and was able to do that little bit of rewriting and configuration at the same time. I repeated my tests both with and without backup power and everything worked perfectly, i.e. the system came back immediately by itself after a grid outage with no backup power, and of course switched over to inverting just fine when there was backup power available.
This leads to the next little bit of fun. The carbon sock is a thing that sits inside the zinc electrolyte tank and helps to keep the electrolyte pH in the correct operating range. Unfortunately I didn’t manage to get a photo of one, but they look a bit like door snakes. Replacing the carbon sock means opening the case, popping one side of the Gas Handling Unit (GHU) off the tank, pulling out the old sock and putting in a new one. Here’s a picture of the ZCell with the back of the case off, indicating where the carbon sock goes:
The tank on the left (with the cooling fan) is for zinc electrolyte. The tank on the right is for bromine electrolyte. The blocky assembly of pipes going into both tanks is the GHU. The rectangular box behind that contains the electrode stacks.
When Murray popped the GHU off, he noticed that one of the larger pipes on one side had perished slightly. Thankfully he happened to have a spare GHU with him so was able to replace the assembly immediately. All was well until later that afternoon, when the battery indicated hardware failure due to “Leak 1 Trip” and shut itself down out of an abundance of caution. Upon further investigation the next day, Murry and I discovered there was a tiny split in one of the little hoses going into the GHU which was letting the electrolyte drip out.
Drip… Drip… Drip…
This small electrolyte leak was caught lower down in the battery, where the leak sensor is. Murray sucked the leaked electrolyte out of there, re-terminated that little hose and we were back in business. I was happy to learn that Redflow had obviously thought about the possibility of this type of failure and handled it. As I said to Murray at the time, we’d rather have a battery that leaks then turns itself off than a battery that catches fire!
Aside from those two interesting events, the rest of the year of operation was largely quite boring, which is exactly what one wants from a power system. As before I kept a small overnight scheduled charge and a larger late afternoon scheduled charge active on weekdays to ensure there was some power in the battery to use at peak (i.e. expensive) grid times. In spring and summer the afternoon charge is largely superfluous because the battery has usually been well filled up from the solar by then anyway, but there’s no harm in leaving it turned on. The one hack I did do during the year was to figure out a way to keep a small (I went with 15%) MinSoC in the battery at all times except for maintenance cycle evenings, and the morning after. This is more than enough to smooth out minor grid outages of a few minutes, and given our general load levels should be enough to run the house for more than an hour overnight if necessary, provided the hot water system and heating don’t decide to come on at the same time.
My earlier experiment along these lines involved a script that ran on the Cerbo twice a day to adjust scheduled charge settings in order to keep the battery at 100% SoC at all times except for peak electricity hours and maintenance cycle evenings. As mentioned in TANSTAAFL I ran that for all of July, August and most of September 2022. It worked fine, but ultimately I decided it was largely a waste of energy and money, especially when run during the winter months when there’s not much sun and you end up doing a lot of grid charging. This is a horribly inefficient way of getting power into the battery (AC to DC) versus charging the battery direct from solar PV. We did still use those scripts in the second year, but rather more judiciously, i.e. we kept an eye on the BOM forecasts as we always do, then occasionally activated the 100% charge when we knew severe weather and/or thunderstorms were on the way, those being the things most likely to cause extended grid outages. I also manually triggered maintenance on the battery earlier than strictly necessary several times when we expected severe weather in the coming days, to avoid having a maintenance cycle (and thus empty battery) coincide with potential outages. On most of those occasions this effort proved to be unnecessary. Bearing all that in mind, my general advice to anyone else with a single ZCell system (aside from maybe adding scheduled charges to time-shift expensive peak electricity) is to just leave it alone and let it do its thing. You’ll use most of your locally generated electricity onsite, you’ll save some money on your power bills, and you’ll avoid some, but not all, grid outages. This is a pretty good position to be in.
That said, I couldn’t resist messing around some more, hence my MinSoC experiment. Simon’s installation guide points out that “for correct system operation, the Settings->ESS menu ‘Min SoC’ value must be set to 0% in single-ZCell systems”. The issue here is that if MinSoC is greater than 0%, the Victron gear will try to charge the battery while the battery is simultaneously trying to empty itself during maintenance, which of course just isn’t going to work. My solution to this is the following script, which I run from a cron job on the Cerbo twice a day, once at midnight UTC and again at 06:00 UTC with the --check-maintenance flag set:
Midnight UTC corresponds to the end of our morning peak electricity time, and 06:00 UTC corresponds to the start of our afternoon peak. What this means is that after the morning peak finishes, the MinSoC setting will cause the system to automatically charge the battery to the value specified if it’s not up there already. Given it’s after the morning peak (10:00 AEST / 11:00 AEDT) this charge will likely come from solar PV, not the grid. When the script runs again just before the afternoon peak (16:00 AEST / 17:00 AEDT), MinSoC is set to either the value specified (effectively a no-op), or zero if it’s a maintenance day. This allows the battery to be discharged correctly in the evening on maintenance days, while keeping some charge every other day in case of emergencies. Unlike the script that tries for 100% SoC, this arrangement results in far less grid charging, while still giving protection from minor outages most of the time.
In case Simon is reading this now and is thinking “FFS, I wrote ‘MinSoC must be set to 0% in single-ZCell systems’ for a reason!” I should also add a note of caution. The script above detects ZCell maintenance cycles based solely on the configured maintenance time limit and the duration since last maintenance. It does not – and cannot – take into account occasions when the user manually forces maintenance, or situations in which a ZCell for whatever reason hypothetically decides to go into maintenance of its own accord. The latter shouldn’t generally happen, but it can. The point is, if you’re running this MinSoC script from a cron job, you really do still want to keep an eye on what the battery is doing each day, in case you need to turn that setting off and disable the cron job. If you’re not up for that I will reiterate my general advice from earlier: just leave the system alone – let it do its thing and you’ll (almost always) be perfectly fine. Or, get a second ZCell and you can ignore the last several paragraphs entirely.
Now, finally, let’s look at some numbers. The year periods here are a little sloppy for irritating historical reasons. 2018-2019, 2019-2020 and 2020-2021 are all August-based due to Aurora Energy’s previous quarterly billing cycle. The 2021-2022 year starts in late September partly because I had to wait until our new electricity meter was installed in September 2021, and partly because it let me include some nice screenshots when I started writing TANSTAAFL on September 25, 2022. I’ve chosen to make this year (2022-2023) mostly sane, in that it runs from October 1, 2022 through September 30, 2023 inclusive. This is only six days offset from the previous year, but notably makes it much easier to accurately correlate data from the VRM portal with our bills from Aurora. Overall we have five consecutive non-overlapping 12 month periods that are pretty close together. It’s not perfect, but I think it’s good enough to work with for our purposes here.
YeaR
Grid In
Solar In
Total In
Loads
Export
2018-2019
9,031
6,682
15,713
11,827
3,886
2019-2020
9,324
6,468
15,792
12,255
3,537
2020-2021
7,582
6,347
13,929
10,358
3,571
2021-2022
8,531
5,640
14,171
10,849
754
2022-2023
8,936
5,744
14,680
11,534
799
Overall, 2022-2023 had a similar shape to 2021-2022, including the fact that in both these years we missed three weeks of solar generation in late summer. In 2022 this was due to replacing the MPPT, and in 2023 it was because we replaced the roof. In both cases our PV generation was lower than it should have been by an estimated 500-600kW. Hopefully nothing like this happens again in future years.
All of our numbers in 2022-2023 were a bit higher than in 2021-2022. We pulled 4.75% more power from the grid, generated 1.84% more solar, the total power going into the system (grid + solar) was 3.59% higher, our loads used 6.31% more power, and we exported 5.97% more power than the previous year.
I honestly don’t know why our loads used more power this year. Here’s a table showing our consumption for both years, and the differences each month (note that September 2022 is only approximate because of how the years don’t quite line up):
Month
2022
2023
Diff
October
988
873
-115
November
866
805
-61
December
767
965
198
January
822
775
-47
February
638
721
83
March
813
911
98
April
775
1,115
340
May
953
1,098
145
June
1,073
1,149
76
July
1,118
1,103
-15
August
966
1,065
99
September
1,070
964
-116
Here’s a graph:
WTF happened in December and April?!?
Did we use more cooling this December? Did we use more heating this April and May? I dug the nearest weather station’s monthly mean minimum and maximum temperatures out of the BOM Climate Data Online tool and found that there’s maybe a degree or so variance one way or the other each month year to year, so I don’t know what I can infer from that. All I can say is that something happened in December and April, but I don’t know what.
Another interesting thing is that what I referred to as “the energy cost of the system” in TANSTAAFL has gone down. That’s the kW figure below in the “what?” column, which is the difference between grid in + solar in – loads – export, i.e. the power consumed by the system itself. In 2021-2022, that was 2,568 kW, or about 18% of the total power that went into the system. In 2022-2023 it was down to 2,347kWh, or just under 16%:
Year
Grid In
Solar In
Total In
Loads
Export
Total Out
what?
2021-2022
8,531
5,640
14,171
10,849
754
11,603
2,568
2022-2023
8,936
5,744
14,680
11,534
799
12,333
2,347
I suspect the cause of this reduction is that we didn’t spend two and a half months doing lots of grid charging of the battery in 2022-2023. If that’s the case, this again points to the advisability of just letting the system do its thing and not messing with it too much unless you really know you need to.
The last set of numbers I have involve actual money. Here’s what our electricity bills looked like over the past five years:
Year
From Grid
Total Bill
Cost/kWh
2018-2019
9,031
$2,278.33
$0.25
2019-2020
9,324
$2,384.79
$0.26
2020-2021
7,582
$1,921.77
$0.25
2021-2022
8,531
$1,731.40
$0.20
2022-2023
8,936
$1,989.12
$0.22
Note that cost/kWh as I have it here is simply the total dollar amount of our bills divided by the total power drawn from the grid (I’m deliberately ignoring the additional power we use that comes from the sun in this calculation). The bills themselves say “peak power costs $X, off-peak costs $Y, you get $Z back for power exported and there’s a daily supply charge of $SUCKS_TO_BE_YOU”, but that’s all noise. What ultimately matters in my opinion is what I call the effective cost per kilowatt hour, which is why those things are all smooshed together here. The important point is that with our existing solar array we were previously effectively paying about $0.25 per kWh for grid power. After getting the battery and switching to Peak & Off-Peak billing, that went down to $0.20/kWh – a reduction of 20%. Now we’ve inched back up to $0.22/kWh, but it turns out that’s just because power prices have increased. As far as I can tell Aurora Energy don’t publish historical pricing data, so as a public service, I’ll include what I’ve been able to glean from our prior bills here:
July 2023 onwards:
Daily supply charge: $1.26389
Peak: $0.36198/kWh
Off-Peak: $0.16855/kWh
Feed-In Tariff: $0.10869/kWh
July 2022 – July 2023
Daily supply charge: $1.09903
Peak: $0.33399/kWh
Off-Peak: $0.15551/kWh
Feed-In Tariff: $0.08883/kWh
Before July 2022:
Daily supply charge: $0.98
Peak: $0.29852
Off-Peak: $0.139
Feed-In Tariff: $0.06501
It’s nice that the feed-in tariff (i.e. what you get credited when you export power) has gone up quite a bit, but unless you’re somehow able to export 2-3x more power than you import, you’ll never get ahead of the ~20% increase in power prices over the last two years.
Having calculated the effective cost/kWh for grid power, I’m now going to do one more thing which I didn’t think to do during last year’s analysis, and that’s calculate the effective cost/kWh of running our loads, bearing in mind that they’re partially powered from the grid, and partially from the sun. I’ve managed to dig up some old Aurora bills from 2016-2017, back before we put the solar panels on. This should make for an interesting comparison.
Year
From Grid
Total Bill
Grid $/kWh
Loads
Loads $/kWh
2016-2017
17,026
$4,485.45
$0.26
17,026
$0.26
2018-2019
9,031
$2,278.33
$0.25
11,827
$0.19
2019-2020
9,324
$2,384.79
$0.26
12,255
$0.19
2020-2021
7,582
$1,921.77
$0.25
10,358
$0.19
2021-2022
8,531
$1,731.40
$0.20
10,849
$0.16
2022-2023
8,936
$1,989.12
$0.22
11,534
$0.17
The first thing to note is the horrifying 17 megawatts we pulled in 2016-2017. Given the hot water and lounge room heat pump were on a separate tariff, I was able to determine that four of those megawatts (i.e. about 24% of our power usage) went on heating that year. Replacing the crusty old conventional electric hot water system with a Sanden heat pump hot water service cut that in half – subsequent years showed the heating/hot water tariff using about 2MW/year. We obviously also somehow reduced our loads by another ~3MW/year on top of that, but I can’t find the Aurora bills for 2017-2018 so I’m not sure exactly when that drop happened. My best guess is that I probably got rid of some old, always-on computer equipment.
The second thing to note is how the cost of running the loads drops. In 2016-2017 the grid cost/kWh is the same as the loads cost/kWh, because grid power is all we had. From 2018-2021 though, the load cost/kWh drops to $0.19, a saving of about 26%. It remains there until 2021-2022 when we got the battery and it dropped again to $0.16 (another 15% or so). So the big win was certainly putting the solar panels on and swapping the hot water system, with the battery being a decent improvement on top of that.
Further wins are going to come from decreasing our power consumption. In previous posts I had mentioned the need to replace panel heaters with heat pumps, and also that some of our aging computer equipment needed upgrading. We did finally get a heat pump installed in the master bedroom this year, and we replaced the old undersized lounge room heat pump with a new correctly sized unit. This happened on June 30 though, so will have had minimal impact on this years’ figures. Likewise an always-on computer that previously pulled ~100W is now better, stronger and faster in all respects, while only pulling ~50W. That will save us ~438kW of power per year, but given the upgrade happened in mid August, again we won’t see the full effects until later.
I’m looking forward to doing another one of these posts in a year’s time. Hopefully I will have nothing at all interesting to report.
I (relatively) recently went down the rabbit hole of trying out personal finance apps to help get a better grip on, well, the things you’d expect (personal finances and planning around them).
In the past, I’ve had an off-again-on-again relationship with GNUCash. I did give it a solid go for a few months in 2004/2005 it seems (I found my old files) and I even had the OFX exports of transactions for a limited amount of time for a limited number of bank accounts! Amazingly, there’s a GNUCash port to macOS, and it’ll happily open up this file from what is alarmingly close to 20 years ago.
Back in those times, running Linux on the desktop was even more of an adventure than it has been since then, and I always found GNUCash to be strange (possibly a theme with me and personal finance software), but generally fine. It doesn’t seem to have changed a great deal in the years since. You still have to manually import data from your bank unless you happen to be lucky enough to live in the very limited number of places where there’s some kind of automation for it.
So, going back to GNUCash was an option. But I wanted to survey the land of what was available, and if it was possible to exchange money for convenience. I am not big on the motivation to go and spend a lot of time on this kind of thing anyway, so it had to be easy for me to do so.
For my requirements, I basically had:
Support multiple currencies
Be able to import data from my banks, even if manually
Some kind of reporting and planning tools
Be easy enough to use for me, and not leave me struggling with unknown concepts
The ability to export data. No vendor lock-in
I viewed a mobile app (iOS) as a Nice to Have rather than essential. Given that, my shortlist was:
I’ve used it before, its web site at https://www.gnucash.org/ looks much the same as it always has. It’s Free and Open Source Software, and is thus well aligned with my values, and that’s a big step towards not having vendor lock-in.
I honestly could probably make it work. I wish it had the ability to import transactions from banks for anywhere I have ever lived or banked with. I also wish the UI got to be a bit more consistent and modern, and even remotely Mac like on the Mac version.
Honestly, if the deal was that a web service would pull bank transactions in exchange for ~$10/month and also fund GNUCash development… I’d struggle to say no.
Here’s an option that has been around forever – https://www.quicken.com/ – and one that I figured I should solidly look at. It’s actually one I even spent money on…. before requesting a refund. It’s Import/Export is so broken it’s an insult to broken software everywhere.
Did you know that Quicken doesn’t import the Quicken Interchange Format (QIF), and hasn’t since 2005?
Me, incredulously, when trying out quicken
I don’t understand why you wouldn’t support as many as possible formats that banks export your transaction data as. It cannot possibly be that hard to parse these things, nor can it possibly be code that requires a lot of maintenance.
This basically meant that I couldn’t import data from my Australian Banks. Urgh. This alone ruled it out.
It really didn’t build confidence in ever getting my data out. At every turn it seemed to be really keen on locking you into Quicken rather than having a good experience all-up.
This one was new to me – https://www.wiz.money/ – and had a fancy URL and everything. I spent a bunch of time trying MoneyWiz, and I concluded that it is pretty, but buggy. I had managed to create a report where it said I’d earned $0, but you click into it, and then it gives actual numbers. Not being self consistent and getting the numbers wrong, when this is literally the only function of said app (to get the numbers right), took this out of the running.
It did sync from my US and Australian banks though, so points there.
Intuit used to own Quicken until it sold it to H.I.G. Capital in 2016 (according to Wikipedia). I have no idea if that has had an impact as to the feature set / usability of Quicken, but they now have this Cloud-only product called Mint.
The big issue I had with Mint was that there didn’t seem to be any way to get your data out of it. It seemed to exemplify vendor lock-in. This seems to have changed a bit since I was originally looking, which is good (maybe I just couldn’t find it?). But with the cloud-only approach I wasn’t hugely comfortable with having everything there. It also seemed to be lacking a few features that I was begging to find useful in other places.
It is the only product that links with the Apple Card though. No idea why that is the case.
The price tag of $0 was pretty unbeatable, which does make me wonder where the money is made from to fund its development and maintenance. My guess is that it’s through commission on the various financial products advertised through it, and I dearly hope it is not through selling data on its users (I have no reason to believe it is, there’s just the popular habit of companies doing this).
This is what I’ve settled on. It seemed to be easy enough for me to figure out how to use, sync with an iPhone App, be a reasonable price, and be able to import and sync things from accounts that I have. Oddly enough, nothing can connect and pull things from the Apple Card – which is really weird. That isn’t a Banktivity thing though, that’s just universal (except for Intuit’s Mint).
I’ve been using it for a bit more than a year now, and am still pretty happy. I wish there was the ability to attach a PDF of a statement to the Statement that you reconcile. I wish I could better tune the auto match/classification rules, and a few other relatively minor things.
Periodically in life I’ve had the desire to be somewhat fit, or at least have the benefits that come with that such as not dying early and being able to navigate a mountain (or just the city of Seattle) on foot without collapsing. I have also found that holding myself accountable via data is pretty vital to me actually going and repeatedly doing something.
So, at some point I got myself a Garmin watch. The year was 2012 and it was a Garmin Forerunner 410. It had a standard black/grey LCD screen, GPS (where getting a GPS lock could be utterly infuriatingly slow), a sensor you attached to your foot, a sensor you strap to your chest for Heart Rate monitoring, and an ANT+ dongle for connecting to a PC to download your activities. There was even some open source software that someone wrote so I could actually get data off my watch on my Linux laptops. This wasn’t a smart watch – it was exclusively for wearing while exercising and tracking an activity, otherwise it was just a watch.
However, as I was ramping up to marathon distance running, one huge flaw emerged: I was not fast enough to run a marathon in the time that the battery in my Garmin lasted. IIRC it would end up dying around 3hr30min into something, which at the time was increasingly something I’d describe as “not going for too long of a run”. So, the search for a replacement began!
The year was 2017, and the Garmin fenix 5x attracted me for two big reasons: a battery life to be respected, and turn-by-turn navigation. At the time, I seldom went running with a phone, preferring a tiny SanDisk media play (RIP, they made a new version that completely sucked) and a watch. The attraction of being able to get better maps back to where I started (e.g. a hotel in some strange city where I didn’t speak the language) was very appealing. It also had (what I would now describe as) rudimentary smart-watch features. It didn’t have even remotely everything the Pebble had, but it was enough.
So, a (non-trivial) pile of money later (even with discounts), I had myself a shiny and virtually indestructible new Garmin. I didn’t even need a dongle to sync it anywhere – it could just upload via its own WiFi connection, or through Bluetooth to the Garmin Connect app to my phone. I could also (if I ever remembered to), plug in the USB cable to it and download the activities to my computer.
One problem: my skin rebelled against the Garmin fenix 5x after a while. Like, properly rebelled. If it wasn’t coming off, I wanted to rip it off. I tried all of the tricks that are posted anywhere online. Didn’t help. I even got tested for what was the most likely culprit (a Nickel allergy), and didn’t have one of them, so I (still) have no idea what I’m actually allergic to in it. It’s just that I cannot wear it constantly. Urgh. I was enjoying the daily smart watch uses too!
So, that’s one rather expensive watch that is special purpose only, and even then started to get to be a bit of an issue around longer activities. Urgh.
So the hunt began for a smart watch that I could wear constantly. This usually ends in frustration as anything I wanted was hundreds of $ and pretty much nobody listed what materials were in it apart from “stainless steel”, “may contain”, and some disclaimer about “other materials”, which wasn’t a particularly useful starting point for “it is one of these things that my skin doesn’t like”. As at least if the next one also turned out to cause me problems, I could at least have a list of things that I could then narrow down to what I needed to avoid.
So that was all annoying, with the end result being that I went a long time without really wearing a watch. Why? The search resumed periodically and ended up either with nothing, or totally nothing. That was except if I wanted to get further into some vendor lock-in.
Honestly, the only manufacturer of anything smartwatch like which actually listed everything and had some options was Apple. Bizarre. Well, since I already got on the iPhone bandwagon, this was possible. Rather annoyingly, they are very tied together and thus it makes it a bit of a vendor-lock-in if you alternate phone and watch replacement and at any point wish to switch platforms.
That being said though, it does work well and not irritate my skin. So that’s a bonus! If I get back into marathon level distance running, we’ll see how well it goes. But for more common distances that I’ve run or cycled with it… the accuracy seems decent, HR monitor never just sometimes decides I’m not exerting myself, and the GPS actually gets a lock in reasonable time. Plus it can pair with headphones and be the only thing I take out with me.
A few random notes about things that can make life on macOS (the modern one, as in, circa 2023) better for those coming from Linux.
For various reasons you may end up with Mac hardware with macOS on the metal rather than Linux. This could be anything from battery life of the Apple Silicon machines (and not quite being ready to jump on the Asahi Linux bandwagon), to being able to run the corporate suite of Enterprise Software (arguably a bug more than a feature), to some other reason that is also fine.
My approach to most of my development is to have a remote more powerful Linux machine to do the heavy lifting, or do Linux development on Linux, and not bank on messing around with a bunch of software on macOS that would approximate something on Linux. This also means I can move my GUI environment (the Mac) easily forward without worrying about whatever weird workarounds I needed to do in order to get things going for whatever development work I’m doing, and vice-versa.
Terminal emulator? iTerm2. The built in Terminal.app is fine, but there’s more than a few nice things in iTerm2, including tmuxintegration which can end up making it feel a lot more like a regular Linux machine. I should probably go read the tmux integration best practices before I complain about some random bugs I think I’ve hit, so let’s pretend I did that and everything is perfect.
I tend to use the Mac for SSHing to bigger Linux machines for most of my work. At work, that’s mostly to a Graviton 2 EC2 Instance running Amazon Linux with all my development environments on it. At home, it’s mostly a Raptor Blackbird POWER9 system running Fedora.
Running Linux locally? For all the use cases of containers, Podman Desktop or finch. There’s a GUI part of Podman which is nice, and finch I know about because of the relatively nearby team that works on it, and its relationship to lima. Lima positions itself as WSL2-like but for Mac. There’s UTM for a full virtual machine / qemu environment, although I rarely end up using this and am more commonly using a container or just SSHing to a bigger Linux box.
There’s XCode for any macOS development that may be needed (e.g. when you want that extra feature in UTM or something) I do use Homebrew to install a few things locally.
Last week I had occasion to test deploying ceph-csi on a k3s cluster, so that Kubernetes workloads could access block storage provided by an external Ceph cluster. I went with the upstream Ceph documentation, because assuming everything worked it’d then be really easy for me to say to others “just go do this”.
Everything did not work.
I’d gone through all the instructions, inserting my own Ceph cluster’s FSID and MON IP addresses in the right places, applied the YAML to deploy the provisioner and node plugins, and all the provisioner bits were running just fine, but the csi-rbdplugin pods were stuck in CrashLoopBackOff:
The csi-rbdplugin pod consists of three containers – driver-registrar, csi-rbdplugin, liveness-prometheus – and csi-rbdplugin wasn’t able to load the rbd kernel module:
> kubectl logs csi-rbdplugin-22zjr --container csi-rbdplugin
I0726 10:25:12.862125 7628 cephcsi.go:199] Driver version: canary and Git version: d432421a88238a878a470d54cbf2c50f2e61cdda
I0726 10:25:12.862452 7628 cephcsi.go:231] Starting driver type: rbd with name: rbd.csi.ceph.com
I0726 10:25:12.865907 7628 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
E0726 10:25:12.872477 7628 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Key was rejected by service\n"
F0726 10:25:12.872702 7628 driver.go:150] an error (exit status 1) occurred while running modprobe args: [rbd]
Matching “modprobe: ERROR: could not insert ‘rbd’: Key was rejected by service” in the above was an error on each host’s console: “Loading of unsigned module is rejected”. These hosts all have secure boot enabled, so I figured it had to be something to do with that. So I logged into one of the hosts and ran modprobe rbd as root, but that worked just fine. No key errors, no unsigned module errors. And once I’d run modprobe rbd (and later modprobe nbd) on the host, the csi-rbdplugin container restarted and worked just fine.
So why wouldn’t modprobe work inside the container? /lib/modules from the host is mounted inside the container, the container has the right extra privileges… Clearly I needed to run a shell in the failing container to poke around inside when it was in CrashLoopBackOff state, but I realised I had no idea how to do that. I knew I could kubectl exec -it csi-rbdplugin-22zjr --container csi-rbdplugin -- /bin/bash but of course that only works if the container is actually running. My container wouldn’t even start because of that modprobe error.
Having previously spent a reasonable amount of time with podman, which has podman run, I wondered if there were a kubectl run that would let me start a new container using the upstream cephcsi image, but running a shell, instead of its default command. Happily, there is a kubectl run, so I tried it:
> kubectl run -it cephcsi --image=quay.io/cephcsi/cephcsi:canary --rm=true --command=true -- /bin/bash
If you don't see a command prompt, try pressing enter.
[root@cephcsi /]# modprobe rbd
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.14.21-150400.24.66-default
[root@cephcsi /]# ls /lib/modules/
[root@cephcsi /]#
Ohhh, right, of course, that doesn’t have the host’s /lib/modules mounted. podman run lets me add volume mounts using -v options , so surely kubectl run will let me do that too.
At this point in the story, the notes I wrote last week include an awful lot of swearing.
See, kubectl run doesn’t have a -v option to add mounts, but what it does have is an --overrides option to let you add a chunk of JSON to override the generated pod. So I went back to the relevant YAML and teased out the bits I needed to come up with this monstrosity:
But at least I could get a shell and reproduce the problem:
> kubectl run -it cephcsi-test [honking great horrible chunk of JSON]
[root@cephcsi-test /]# ls /lib/modules/
5.14.21-150400.24.66-default
[root@cephcsi-test /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service
A certain amount more screwing around looking at the source for modprobe and bits of the kernel confirmed that the kernel really didn’t think the module was signed for some reason (mod_verify_sig() was returning -ENODATA), but I knew these modules were fine, because I could load them on the host. Eventually I hit on this:
[root@cephcsi-test /]# ls /lib/modules/*/kernel/drivers/block/rbd*
/lib/modules/5.14.21-150400.24.66-default/kernel/drivers/block/rbd.ko.zst
Wait, what’s that .zst extension? It turns out we (SUSE) have been shipping zstd-compressed kernel modules since – as best as I can tell – some time in 2021. modprobe on my SLE Micro 5.3 host of course supports this:
# grep PRETTY /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Micro for Rancher 5.3"
# modprobe --version
kmod version 29
+ZSTD +XZ +ZLIB +LIBCRYPTO -EXPERIMENTAL
modprobe in the CentOS Stream 8 upstream cephcsi container does not:
Mystery solved, but I have to say the error messages presented were spectacularly misleading. I later tried with secure boot disabled, and got something marginally better – in that case modprobe failed with “modprobe: ERROR: could not insert ‘rbd’: Exec format error”, and dmesg on the host gave me “Invalid ELF header magic: != \x7fELF”. If I’d seen messaging like that in the first place I might have been quicker to twig to the compression thing.
Anyway, the point of this post wasn’t to rant about inscrutable kernel errors, it was to rant about how there’s no way anyone could be reasonably expected to figure out how to do that --overrides thing with the JSON to debug a container stuck in CrashLoopBackOff. Assuming I couldn’t possibly be the first person to need to debug containers in this state, I told my story to some colleagues, a couple of whom said (approximately) “Oh, I edit the pod YAML and change the container’s command to tail -f /dev/null or sleep 1d. Then it starts up just fine and I can kubectl exec into it and mess around”. Those things totally work, and I wish I’d thought to do that myself. The best answer I got though was to use kubectl debug to make a copy of the existing pod but with the command changed. I didn’t even know kubectl debug existed, which I guess is my reward for not reading the entire manual
So, finally, here’s the right way to do what I was trying to do:
> kubectl debug csi-rbdplugin-22zjr -it \
--copy-to=csi-debug --container=csi-rbdplugin -- /bin/bash
[root@... /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service
(...do whatever other messing around you need to do, then...)
[root@... /]# exit
Session ended, resume using 'kubectl attach csi-debug -c csi-rbdplugin -i -t' command when the pod is running
> kubectl delete pod csi-debug
pod "csi-debug" deleted
In the above kubectl debug invocation, csi-rbdplugin-22zjr is the existing pod that’s stuck in CrashLoopBackOff, csi-debug is the name of the new pod being created, and csi-rbdplugin is the container in that pod that has its command replaced with /bin/bash, so you can mess around inside it.
The July 2023 meeting sparked multiple new topics including Linux security architecture, Debian ports of LoongArch and Risc-V as well as hardware design of PinePhone backplates.
On the practical side, Russell Coker demonstrated running different applications in isolated environment with bubblewrap sandbox, as well as other hardening techniques and the way they interact with the host system. Russell also discussed some possible pathways of hardening desktop Linux to reach the security level of modern Android. Yifei Zhan demonstrated sending and receiving messages with the PineDio USB LoRa adapter and how to inspect LoRa signal with off-the-shelf software defined radio receiver, and discussed how the driver situation for LoRa on Linux might be improved. Yifei then gave a demonstration on utilizing KVM on PinePhone Pro to run NetBSD and OpenBSD virtual machines, more details on running VMs on the PinePhone Pro can be found on this blog post from Yifei.
We also had some discussion of the current state of Mobian and Debian ecosystem, along with how to contribute to different parts of Mobian with a Mobian developer who joined us.
I’ve had a pretty varied experience with photo management on Linux over the past couple of decades. For a while I used f-spot as it was the new hotness. At some point this became…. slow and crashy enough that it was unusable. Today, it appears that the GitHub project warns that current bugs include “Not starting”.
At some point (and via a method I have long since forgotten), I did manage to finally get my photos over to Shotwell, which was the new hotness at the time. That data migration was so long ago now I actually forget what features I was missing from f-spot that I was grumbling about. I remember the import being annoying though. At some point in time Shotwell was no longer was the new hotness and now there is GNOME Photos. I remember looking at GNOME Photos, and seeing no method of importing photos from Shotwell, so put it aside. Hopefully that situation has improved somewhere.
At some point Shotwell was becoming rather stagnated, and I noticed more things stopping to work rather than getting added features and performance. The good news is that there has been some more development activity on Shotwell, so hopefully my issues with it end up being resolved.
One recommendation for Linux photo management was digiKam, and one that I never ended up using full time. One of the reasons behind that was that I couldn’t really see any non manual way to import photos from Shotwell into it.
With tens of thousands of photos (~58k at the time of writing), doing things manually didn’t seem like much fun at all.
As I postponed my decision, I ended up moving my main machine over to a Mac for a variety of random reasons, and one quite motivating thing was the ability to have Photos from my iPhone magically sync over to my photo library without having to plug it into my computer and copy things across.
So…. how to get photos across from Shotwell on Linux to Photos on a Mac/iPhone (and also keep a very keen eye on how to do it the other way around, because, well, vendor lock-in isn’t great).
It would be kind of neat if I could just run Shotwell on the Mac and have some kind of import button, but seeing as there wasn’t already a native Mac port, and that Shotwell is written in Vala rather than something I know has a working toolchain on macOS…. this seemed like more work than I’d really like to take on.
Luckily, I remembered that Shotwell’s database is actually just a SQLite database pointing to all the files on disk. So, if I could work out how to read it accurately, and how to import all the relevant metadata (such as what Albums a photo is in, tags, title, and description) into Apple Photos, I’d be able to make it work.
So… is there any useful documentation as to how the database is structured?
Semi annoyingly, Shotwell is written in Vala, a rather niche programming language that while integrating with all the GObject stuff that GNOME uses, is largely unheard of. Luckily, the database code in Shotwell isn’t too hard to read, so was a useful fallback for when the documentation proves inadequate.
Programming the Mac side of things, it was a good excuse to start looking at Swift, so knowing I’d also need to read a SQLite database directly (rather than use any higher level abstraction), I armed myself with the following resources:
From here, I could work on getting the first half going, the ability to view my Shotwell database on the Mac (which is what I posted a screenshot of back in Feb 2022).
But also, I had to work out what I was doing on the other end of things, how would I import photos? It turns out there’s an API!
A bit of SwiftUI code:
import SwiftUI
import AppKit
import Photos
struct ContentView: View {
@State var favorite_checked : Bool = false
@State var hidden_checked : Bool = false
var body: some View {
VStack() {
Text("Select a photo for import")
Toggle("Favorite", isOn: $favorite_checked)
Toggle("Hidden", isOn: $hidden_checked)
Button("Import Photo")
{
let panel = NSOpenPanel()
panel.allowsMultipleSelection = false
panel.canChooseDirectories = false
if panel.runModal() == .OK {
let photo_url = panel.url!
print("selected: " + String(photo_url.absoluteString))
addAsset(url: photo_url, isFavorite: favorite_checked, isHidden: hidden_checked)
}
}
.padding()
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
Combined with a bit of code to do the import (which does look a bunch like the examples in the docs):
import SwiftUI
import Photos
import AppKit
@main
struct SinglePhotoImporterApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
func addAsset(url: URL, isFavorite: Bool, isHidden: Bool) {
// Add the asset to the photo library.
let path = "/Users/stewart/Pictures/1970/01/01/1415446258647.jpg"
let url = URL(fileURLWithPath: path)
PHPhotoLibrary.shared().performChanges({
let addedImage = PHAssetChangeRequest.creationRequestForAssetFromImage(atFileURL: url)
addedImage?.isHidden = isHidden
addedImage?.isFavorite = isFavorite
}, completionHandler: {success, error in
if !success { print("Error creating the asset: \(String(describing: error))") } else
{
print("Imported!")
}
})
}
This all meant I could import a single photo. However, there were some limitations.
There’s the PHAssetCollectionChangeRequest to do things to Albums, so it would solve that problem, but I couldn’t for the life of me work out how to add/edit Titles and Descriptions.
It was so close!
So what did I need to do in order to import Titles and Descriptions? It turns out you can do that via AppleScript. Yes, that thing that launched in 1993 and has somehow survived the transition of m68k based Macs to PowerPC based Macs to Intel based Macs to ARM based Macs.
The Photos dictionary for AppleScript
So, just to make it easier to debug what was going on, I started adding code to my ShotwellImporter tool that would generate snippets of AppleScript I could run and check that it was doing the right thing…. but then very quickly ran into a problem…. it appears that the AppleScript language interpreter on modern macOS has limits that you’d be more familiar with in 1993 than 2023, and I very quickly hit limits where the script would just error out before running (I was out of dictionary size allegedly).
But there’s a new option! Everything you can do with AppleScript you can now do with JavaScript – it’s just even less documented than AppleScript is! But it does work! I got to the point where I could generate JavaScript that imported photos, into all the relevant albums, and set title and descriptions.
In my last post, I wrote about how I taught sesdev (originally a tool for deploying Ceph clusters on virtual machines) to deploy k3s, because I wanted a little sandbox in which I could break learn more about Kubernetes. It’s nice to be able to do a toy deployment locally, on a bunch of VMs, on my own hardware, in my home office, rather than paying to do it on someone else’s computer. Given the k3s thing worked, I figured the next step was to teach sesdev how to deploy Longhorn so I could break that learn more about that too.
Install nfs-client, open-iscsi and e2fsprogs packages on all nodes.
Make an ext4 filesystem on /dev/vdb on all the nodes that have extra disks, then mount that on /var/lib/longhorn.
Use kubectl label node -l 'node-role.kubernetes.io/master!=true' node.longhorn.io/create-default-disk=true to ensure Longhorn does its storage thing only on the nodes that aren’t the k3s master.
Install Longhorn with Helm, because that will install the latest version by default vs. using kubectl where you always explicitly need to specify the version.
Create an ingress so the UI is exposed… from all nodes, via HTTP, with no authentication. Remember: this is a sandbox – please don’t do this sort of thing in production!
So, now I can do this:
> sesdev create k3s --deploy-longhorn
=== Creating deployment "k3s-longhorn" with the following configuration ===
Deployment-wide parameters (applicable to all VMs in deployment):
- deployment ID: k3s-longhorn
- number of VMs: 5
- version: k3s
- OS: tumbleweed
- public network: 10.20.78.0/24
Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y
=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider…
Bringing machine 'node1' up with 'libvirt' provider…
Bringing machine 'node2' up with 'libvirt' provider…
Bringing machine 'node3' up with 'libvirt' provider…
Bringing machine 'node4' up with 'libvirt' provider…
[... lots more log noise here - this takes several minutes... ]
=== Deployment Finished ===
You can login into the cluster with:
$ sesdev ssh k3s-longhorn
Longhorn will now be deploying, which may take some time.
After logging into the cluster, try these:
# kubectl get pods -n longhorn-system --watch
# kubectl get pods -n longhorn-system
The Longhorn UI will be accessible via any cluster IP address
(see the kubectl -n longhorn-system get ingress output above).
Note that no authentication is required.
…and, after another minute or two, I can access the Longhorn UI and try creating some volumes. There’s a brief period while the UI pod is still starting where it just says “404 page not found”, and later after the UI is up, there’s still other pods coming online, so on the Volume screen in the Longhorn UI an error appears: “failed to get the parameters: failed to get target node ID: cannot find a node that is ready and has the default engine image longhornio/longhorn-engine:v1.4.1 deployed“. Rest assured thisgoes away in due course (it’s not impossible I’m suffering here from rural Tasmanian internet lag pulling container images). Anyway, with my five nodes – four of which have an 8GB virtual disk for use by Longhorn – I end up with a bit less than 22GB storage available:
21.5 GiB isn’t much, but remember this is a toy deployment running in VMs on my desktop Linux box
Now for the fun part. Longhorn is a distributed storage solution, so I thought it would be interesting to see how it handled a couple of types of failure. The following tests are somewhat arbitrary (I’m really just kicking the tyres randomly at this stage) but Longhorn did, I think, behave pretty well given what I did to it.
Volumes in Longhorn consist of replicas stored as sparse files on a regular filesystem on each storage node. The Longhorn documentation recommends using a dedicated disk rather than just having /var/lib/longhorn backed by the root filesystem, so that’s what sesdev does: /var/lib/longhorn is an ext4 filesystem mounted on /dev/vdb. Now, what happens to Longhorn if that underlying block device suffers some kind of horrible failure? To test that, I used the Longhorn UI to create a 2GB volume, then attached that to the master node:
The Longhorn UI helpfully tells me the volume replicas are on node3, node4 and node1
Then, I ssh’d to the master node and with my 2GB Longhorn volume attached, made a filesystem on it and created a little file:
> sesdev ssh k3s-longhorn
Have a lot of fun...
master:~ # cat /proc/partitions
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
8 0 2097152 sda
master:~ # mkfs /dev/sda
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 3709b21c-b9a2-41c1-a6dd-e449bdeb275b
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
master:~ # mount /dev/sda /mnt
master:~ # echo foo > /mnt/foo
master:~ # cat /mnt/foo
foo
Then I went and trashed the block device backing one of the replicas:
> sesdev ssh k3s-longhorn node3
Have a lot of fun...
node3:~ # ls /var/lib/longhorn
engine-binaries longhorn-disk.cfg lost+found replicas unix-domain-socket
node3:~ # dd if=/dev/urandom of=/dev/vdb bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.486205 s, 216 MB/s
node3:~ # ls /var/lib/longhorn
node3:~ # dmesg|tail -n1
[ 6544.197183] EXT4-fs error (device vdb): ext4_map_blocks:607: inode #393220: block 1607168: comm longhorn: lblock 0 mapped to illegal pblock 1607168 (length 1)
At this point, the Longhorn UI still showed the volume as green (healthy, ready, scheduled). Then, back on the master node, I tried creating another file:
master:~ # echo bar > /mnt/bar
master:~ # cat /mnt/bar
bar
That’s fine so far, but suddenly the Longhorn UI noticed that something very bad had happened:
The volume is still usable, but one of the replicas has failed
Ultimately node3 was rebooted and ended up stalled with the console requesting the root password for maintenance:
Failed to mount /var/lib/longhorn – Can’t find ext4 filesystem
Meanwhile, Longhorn went and rebuilt a third replica on node2:
All green again!
…and the volume remained usable the entire time:
master:~ # echo baz > /mnt/baz
master:~ # ls /mnt
bar baz foo lost+found
That’s perfect!
Looking at the Node screen we could see that node3 was still down:
There may be disk size errors with down nodes (4.87 TiB looks a lot like integer overflow to me)
That’s OK, I was able to fix node3. I logged in on the console and ran mkfs.ext4 /dev/vdb then brought the node back up again.The disk remained unschedulable, because Longhorn was still expecting the ‘old’ disk to be there (I assume based on the UUID stored in /var/lib/longhorn/longhorn-disk.cfg) and of course the ‘new’ disk is empty. So I used the Longhorn UI to disable scheduling for that ‘old’ disk, then deleted it. Shortly after, Longhorn recognised the ‘new’ disk mounted at /var/lib/longhorn and everything was back to green across the board.
So Longhorn recovered well from the backing store of one replica going bad. Next I thought I’d try to break it from the other end by running a volume out of space. What follows is possibly not a fair test, because what I did was create a single Longhorn volume larger than the underlying disks, then filled that up. In normal usage, I assume one would ensure there’s plenty of backing storage available to service multiple volumes, that individual volumes wouldn’t generally be expected to get more than a certain percentage full, and that some sort of monitoring and/or alerting would be in place to warn of disk pressure.
With four nodes, each with a single 8GB disk, and Longhorn apparently reserving 2.33GB by default on each disk, that means no Longhorn volume can physically store more than a bit over 5.5GB of data (see the Size column in the previous screenshot). Given that the default setting for Storage Over Provisioning Percentage is 200, we’re actually allowed to allocate up to a bit under 11GB.
So I went and created a 10GB volume, attached that to the master node, created a filesystem on it, and wrote a whole lot of zeros to it:
…there was a lot of unpleasantness on the master node’s console…
So many I/O errors!
…the replicas became unschedulable due to lack of space…
This doesn’t look good
…and finally the volume faulted:
This really doesn’t look good
Now what?
It turns out that Longhorn will actually recover if we’re able to somehow expand the disks that store the replicas. This is probably a good argument for backing Longhorn with an LVM volume on each node in real world deployments, because then you could just add another disk and extend the volume onto it. In my case though, given it’s all VMs and virtual block devices, I can actually just enlarge those devices. For each node then, I:
Shut it down
Ran qemu-img resize /var/lib/libvirt/images/k3s-longhorn_$NODE-vdb.qcow2 +8G
Started it back up again and ran resize2fs /dev/vdb to take advantage of the extra disk space.
After doing that to node1, Longhorn realised there was enough space there and brought node1’s replica of my 10GB volume back online. It also summarily discarded the other two replicas from the still-full disks on node2 and node3, which didn’t yet have enough free space to be useful:
One usable replica is better than three unusable replicas
As I repeated the virtual disk expansion on the other nodes, Longhorn happily went off and recreated the missing replicas:
Finally I could re-attach the volume to the master node, and have a look to see how many of my zeros were actually written to the volume:
master:~ # cat /proc/partitions
major minor #blocks name
254 0 44040192 vda
254 1 2048 vda1
254 2 20480 vda2
254 3 44016623 vda3
8 0 10485760 sda
master:~ # mount /dev/sda /mnt
master:~ # ls -l /mnt
total 7839764
-rw-r--r-- 1 root root 8027897856 May 3 04:41 big-lot-of-zeros
drwx------ 2 root root 16384 May 3 04:34 lost+found
Recall that dd claimed to have written 9039773696 bytes before it stalled when the volume faulted, so I guess that last gigabyte of zeros is lost in the aether. But, recall also that this isn’t really a fair test – one overprovisioned volume deliberately being quickly and deliberately filled to breaking point vs. a production deployment with (presumably) multiple volumes that don’t fill quite so fast, and where one is hopefully paying at least a little bit of attention to disk pressure as time goes by.
It’s worth noting that in a situation where there are multiple Longhorn volumes, assuming one disk or LVM volume per node, the replicas will all share the same underlying disks, and once those disks are full it seems all the Longhorn volumes backed by them will fault. Given multiple Longhorn volumes, one solution – rather than expanding the underlying disks – is simply to delete a volume or two if you can stand to lose the data, or maybe delete some snapshots (I didn’t try the latter yet). Once there’s enough free space, the remaining volumes will come back online. If you’re really worried about this failure mode, you could always just disable overprovisioning in the first place – whether this makes sense or not will really depend on your workloads and their data usage patterns.
All in all, like I said earlier, I think Longhorn behaved pretty well given what I did to it. Some more information in the event log could perhaps be beneficial though. In the UI I can see warnings from longhorn-node-controller e.g. “the disk default-disk-1cdbc4e904539d26(/var/lib/longhorn/) on the node node1 has 3879731200 available, but requires reserved 2505089433, minimal 25% to schedule more replicas” and warnings from longhorn-engine-controller e.g. “Detected replica overprovisioned-r-73d18ad6 (10.42.3.19:10000) in error“, but I couldn’t find anything really obvious like “Dude, your disks are totally full!”
Later, I found more detail in the engine manager logs after generating a support bundle ([…] level=error msg=”I/O error” error=”tcp://10.42.4.34:10000: write /host/var/lib/longhorn/replicas/overprovisioned-c3b9b547/volume-head-003.img: no space left on device”) so the error information is available – maybe it’s just a matter of learning where to look for it.
We – that is to say the storage team at SUSE – have a tool we’ve been using for the past few years to help with development and testing of Ceph on SUSE Linux. It’s called sesdev because it was created largely for SES (SUSE Enterprise Storage) development. It’s essentially a wrapper around vagrant and libvirt that will spin up clusters of VMs running openSUSE or SLES, then deploy Ceph on them. You would never use such clusters in production, but it’s really nice to be able to easily spin up a cluster for testing purposes that behaves something like a real cluster would, then throw it away when you’re done.
I’ve recently been trying to spend more time playing with Kubernetes, which means I wanted to be able to spin up clusters of VMs running openSUSE or SLES, then deploy Kubernetes on them, then throw the clusters away when I was done, or when I broke something horribly and wanted to start over. Yes, I know there’s a bunch of other tools for doing toy Kubernetes deployments (minikube comes to mind), but given I already had sesdev and was pretty familiar with it, I thought it’d be worthwhile seeing if I could teach it to deploy k3s, a particularly lightweight version of Kubernetes. Turns out that wasn’t too difficult, so now I can do this:
> sesdev create k3s
=== Creating deployment "k3s" with the following configuration ===
Deployment-wide parameters (applicable to all VMs in deployment):
deployment ID: k3s
number of VMs: 5
version: k3s
OS: tumbleweed
public network: 10.20.190.0/24
Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y
=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider...
Bringing machine 'node1' up with 'libvirt' provider...
Bringing machine 'node2' up with 'libvirt' provider...
Bringing machine 'node3' up with 'libvirt' provider...
Bringing machine 'node4' up with 'libvirt' provider...
[...wait a few minutes(there's lots more log information output here in real life)
...]
=== Deployment Finished ===
You can login into the cluster with:
$ sesdev ssh k3s
…and then I can do this:
> sesdev ssh k3s
Last login: Fri Mar 24 11:50:15 CET 2023 from 10.20.190.204 on ssh
Have a lot of fun…
master:~ # kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 5m16s v1.25.7+k3s1
node2 Ready 2m17s v1.25.7+k3s1
node1 Ready 2m15s v1.25.7+k3s1
node3 Ready 2m16s v1.25.7+k3s1
node4 Ready 2m16s v1.25.7+k3s1
master:~ # kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-79f67d76f8-rpj4d 1/1 Running 0 5m9s
kube-system metrics-server-5f9f776df5-rsqhb 1/1 Running 0 5m9s
kube-system coredns-597584b69b-xh4p7 1/1 Running 0 5m9s
kube-system helm-install-traefik-crd-zz2ld 0/1 Completed 0 5m10s
kube-system helm-install-traefik-ckdsr 0/1 Completed 1 5m10s
kube-system svclb-traefik-952808e4-5txd7 2/2 Running 0 3m55s
kube-system traefik-66c46d954f-pgnv8 1/1 Running 0 3m55s
kube-system svclb-traefik-952808e4-dkkp6 2/2 Running 0 2m25s
kube-system svclb-traefik-952808e4-7wk6l 2/2 Running 0 2m13s
kube-system svclb-traefik-952808e4-chmbx 2/2 Running 0 2m14s
kube-system svclb-traefik-952808e4-k7hrw 2/2 Running 0 2m14s
…and then I can make a mess with kubectl apply, helm, etc.
One thing that sesdev knows how to do is deploy VMs with extra virtual disks. This functionality is there for Ceph deployments, but there’s no reason we can’t turn it on when deploying k3s:
> sesdev create k3s --num-disks=2
> sesdev ssh k3s
master:~ # for node in \
$(kubectl get nodes -o 'jsonpath={.items[*].metadata.name}') ;
do echo $node ; ssh $node cat /proc/partitions ; done
master
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
node3
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node2
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node4
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node1
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
As you can see this gives all the worker nodes an extra two 8GB virtual disks. I suspect this may make sesdev an interesting tool for testing other Kubernetes based storage systems such as Longhorn, but I haven’t tried that yet.
I recently bought an
Energica
Experia - the latest, largest and longest distance of Energica's
electric motorbike models.
The decision to do this rather than build my own was complicated, and I'm
going to mostly skip over the detail of that. At some time I might put it in
another blog post. But for now it's enough to say that I'd accidentally
cooked the motor in my Mark I, the work on the Mark II was going to take ages,
and I was in the relatively fortunate situation of being able to afford the
Experia if I sold my existing Triumph Tiger Sport and the parts for the Mark
II.
For other complicated reasons I was planning to be in Sydney after the weekend
that Bruce at Zen Motorcycles told
me the bike would be arriving. Rather than have it freighted down, and since
I would have room for my riding gear in our car, I decided to pick it up and
ride it back on the Monday. In reconnoitering the route, we discovered that
by pure coincidence Zen Motorcycles is on Euston Road in Alexandria, only
200 metres away from the entrance to WestConnex and the M8. So with one
traffic light I could be out of Sydney.
I will admit to being more than a little excited that morning. Electric
vehicles are still, in 2023, a rare enough commodity that waiting lists can be
months long; I ordered this bike in October 2022 and it arrived in March 2023.
So I'd had plenty of time to build my expectations. And likewise the thought
of riding a brand new bike - literally one of the first of its kind in the
country (it is the thirty-second Experia ever made!) - was a little daunting.
I obtained PDF copies of the manual and familiarised myself with turning the
cruise control on and off, as well as checking and setting the regen braking
levels. Didn't want to stuff anything up on the way home.
There is that weird feeling in those situations of things being both very
ordinary and completely unique. I met Bruce, we chatted, I saw the other
Experia models in the store, met Ed - who had come down to chat with Bruce,
and just happened to be the guy who rode a Harley Davidson Livewire from
Perth to Sydney and then from Sydney to Cape Tribulation and back. He shared
stories from his trip and tips on hypermiling. I signed paperwork, picked up
the keys, put on my gear, prepared myself.
Even now I still get a bit choked up just thinking of that moment. Seeing
that bike there, physically real, in front of me - after those months of
anticipation - made the excitement real as well.
So finally, after making sure I wasn't floating, and making sure I had my
ear plugs in and helmet on the right way round, I got on. Felt the bike's
weight. Turned it on. Prepared myself. Took off. My partner followed
behind, through the lights, onto the M8 toward Canberra. I gave her the
thumbs up.
We planned to stop for lunch at Mittagong, while the NRMA still offers the
free charger at the RSL there. One lady was charging her Nissan Leaf on the
ChaDeMo side; shortly after I plugged in a guy arrived in his Volvo XC40
Recharge. He had the bigger battery and would take longer; I just needed a
ten minute top up to get me to Marulan.
I got to Marulan and plugged in; a guy came thinking he needed to tell the
petrol motorbike not to park in the electric vehicle bay, but then realised
that the plug was going into my bike. Kate headed off, having charged up as
well, and I waited another ten minutes or so to get a bit more charge. Then
I rode back.
I stopped, only once more - at Mac's Reef Road. I turned off and did a U
turn, then waited for the traffic to clear before trying the bike's
acceleration. Believe me when I say this bike will absolutely do a 0-100km/hr
in under four seconds! It is not a light bike, but when you pull on the power
it gets up and goes.
Here is my basic review, given that experience and then having ridden it for
about ten weeks around town.
The absolute best feature of the Energica Experia is that it is perfectly
comfortable riding around town. Ease on the throttle and it gently takes off
at the traffic lights and keeps pace with the traffic. Ease off, and it
gently comes to rest with regenerative braking and a light touch on the rear
brake after stopping to hold it still. If you want to take off faster, wind
the throttle on more. It is not temperamental or twitchy, and you have no
annoying gears and clutch to balance.
In fact, I feel much more confident lane filtering, because before I would
have to have the clutch ready and be prepared to give the Tiger Sport lots of
throttle lest I accidentally stall it in front of an irate line of traffic.
With the Experia, I can simply wait peacefully - using no power - and then
when the light goes green I simply twist on the throttle and I am away ahead
of even the most aggressive car driver.
It is amazingly empowering.
I'm not going to bore you with the stats - you can probably look them up
yourself if you care. The main thing to me is that it has DC fast charging,
and watching 75KW go into a 22.5KWHr battery is just a little bit terrifying
as well as incredibly cool. The stated range of 250km on a charge at highway
speeds is absolutely correct, from my experience riding it down from Sydney.
And that plus the fast charging means that I think it is going to be quite
reasonable to tour on this bike, stopping off at fast or even mid-level
chargers - even a boring 22KW charger can fill the battery up in an hour.
The touring group I travel with stops often enough that if those stops can be
top ups, I will not hold anyone up.
Some time in the near future I hope to have a nice fine day where I can take
it out on the Cotter Loop. This is an 80km stretch of road that goes west of
Canberra into the foothills of the Brindabella Ranges, out past the Deep
Space Tracking Station and Tidbinbilla Nature Reserve. It's a great
combination of curving country roads and hilly terrain, and reasonably well
maintained as well. I did that on the Tiger Sport, with a GoPro, before I
sold it - and if I can ever convince PiTiVi to actually compile the video
from it I will put that hour's ride up on a platform somewhere.
I want to do that as much to show off Canberra's scenery as to show off the
bike.
And if the CATL battery capacity improvement comes through to the rest of the
industry, and we get bikes that can do 400km to 500km on a charge, then
electric motorbike touring really will be no different to petrol motorbike
touring. The Experia is definitely at the forefront of that change, but it
is definitely possible on this bike.
Rustup (the community package manage for the Rust language) was starting to really suffer : CI times were up at ~ one hour.
We’ve made some strides in bringing this down.
Caching factory for test scenarios
The first thing, which achieved about a 30% reduction in test time was to stop recreating all the test context every time.
Rustup tests the download/installation/upgrade of distributions of Rust. To avoid downloading gigabytes in the test suite, the suite creates mocks of the published Rust artifacts. These mocks are GPG signed and compressed with multiple compression methods, both of which are quite heavyweight operations to perform – and not actually the interesting code under test to execute.
Previously, every test was entirely hermetic, and usually the server state was also unmodified.
There were two cases where the state was modified. One, a small number of tests testing error conditions such as GPG signature failures. And two, quite a number of tests that were testing temporal behaviour: for instance, install nightly at time A, then with a newer server state, perform a rustup update and check a new version is downloaded and installed.
We’re partway through this migration, but compare these two tests:
The former version mutates the date with set_current_dist_date; the new version uses two scenarios, one for the earlier time, and one for the later time. This permits the server state to be constructed only once. On a per-test basis it can move as much as 50% of the time out of the test.
Single binary for the integration test suite
The next major gain was moving from having 14 separate integration test binaries to just one. This reduces the link cost of linking the test binaries, all of which link in the same library. It also permits us to see unused functions in our test support library, which helps with cleaning up cruft rather than having it accumulate.
Hard linking rather than copying ‘rustup-init’
Part of the test suite for each test is setting up an installed rustup environment. Why not start from scratch every time? Well, we obviously have tests that do that, but most tests are focused on steps beyond the new-user case. Setting up an installed rustup environment has a few steps, but particular ones are copying a binary of rustup into the test sandbox, and hard linking it under various names: cargo, rustc, rustup etc.
A debug build of rustup is ~20MB. Running 400 tests means about 8GB of IO; on some platforms most of that IO won’t hit disk, on others it will.
In review now is a PR that changes the initial copy to a hardlink: we hardlink the rustup-init built by cargo into each test, and then hardlink that to the various binaries. That saves 8GB of IO, which isn’t much from some perspectives, but it adds pressure on the page cache, and is wasted work. One wrinkle is a very low max-links limit on NTFS of 1023; to mitigate that we count the links made to rustup-init and generate a new inode for the original to avoid failures happening.
Future work
In GitHub actions this lowers our test time to 19m for Linux, 24m for Windows, which is a lot better but not great.
I plan on experimenting with separate actions for building release artifacts and doing CI tests – at the moment we have the same action do both, but they don’t share artifacts in the cache in any meaningful way, so we can probably gain parallelism there, as well as turning off release builds entirely for CI.
We should finish the cached test context work and use it everywhere.
Also we’re looking at having less integration tests and more narrow close to the code tests.
Back in 2012, I received a box of eight hundred openSUSE 12.1 promo DVDs, which I then set out to distribute to local Linux users’ groups, tech conferences, other SUSE crew in Australia, and so forth. I didn’t manage to shift all 800 DVDs at the time, and I recently rediscovered the remaining three hundred and eighty four while installing some new shelves. As openSUSE 12.1 went end of life in May 2013, it seemed likely the DVDs were now useless, but I couldn’t bring myself to toss them in landfill. Instead, given last week was Hack Week, I decided to use them for an art project. Here’s the end result:
Geeko mosaic made of cut up openSUSE DVDs, on a 900mm x 600mm piece of plywood
Making that mosaic was extremely fiddly. It’s possibly the most annoying Hack Week project I’ve ever done, but I’m very happy with the outcome
The backing is a piece of 900mm x 600mm x 6mm plywood, primed with some leftover kitchen and bathroom undercoat, then spray pained black. I’d forgotten how bad spray paint smells, but it makes for a nice finish. To get the Geeko shape, I took the official openSUSE logo, then turned it into an outline in Inkscape, saved that as a PNG, opened it in GIMP, and cut it into nine 300mm x 200mm pieces which I then printed on A4 paper, stuck together with tape, and cut out to make a stencil. Of course, the first time I did that, nothing quite lined up, so I had to reprint it but with “Ignore page margins” turned off and “Draw crop marks” turned on, then cut the pages down along the crop marks before sticking them together the second time. Then I placed the stencil on the backing, glued the eye down (that just had to be made from the centre of a DVD!) and started laying out cut up DVD shards.
Geeko mosaic work in progress
I initially tried cutting the DVDs with tin snips, which is easy on the hands, but had a tendency to sometimes warp the DVD pieces and/or cause them to delaminate, so I reverted to a large pair of scissors which was more effort but ultimately less problematic.
After placing the pieces that made up the head, tail, feet and spine, and deciding I was happy with how they looked, I glued each piece down with superglue. Think: carefully pick up DVD shard without moving too many other shards, turn over, dab on a few tiny globs of superglue, lower into place, press for a few seconds, move to next piece. Do not get any superglue on your fingers, or you’ll risk sticking your fingers together and/or make a gluey mess on the shiny visible side of the DVD shards.
It was another three sessions of layout-then-glue-down to fill in the body. I think I stuck my fingers together about six, or eight, or maybe twenty times. Also, despite my best efforts to get superglue absolutely nowhere near the stencil at all, when I removed the stencil, it had stuck to the backing in several places. I managed to scrape/cut that off with a combination of fingernails, tweezers, and the very sharp knife in my SLE 12 commemorative Leatherman tool, then touched up the remaining white bits with a fine point black Sharpie.
SLE 12 commemorative Leatherman tool (it seemed appropriate to use this)
Judging from the leftover DVD centre pieces, this mosaic used about 12 DVDs in all, which isn’t very many considering my initial stash. I had a few other ideas for the remainder, mostly involving hanging them up somehow, which I messed around with earlier on while waiting for the paint to dry on the plywood.
One (failed) idea was to use a cutting wheel on my Dremel tool to slice half way through a few DVDs, then slot them into each other to make a hanging thingy that would spin in the wind. I was unable to make a smooth/straight enough cut for this to work, and superglue doesn’t bridge gaps. You can maybe get an idea of what I was aiming at from this photo:
Four DVDs slotted into each other vertically, kinda, one with nasty superglue smear
My wife had an idea for a better way to do this, which is to take a piece of dowel, cut slots in the sides, and glue DVD halves into the slots using Araldite (that’s an epoxy resin, in case you didn’t grow up with that brand name). I didn’t get around to trying this, but I reckon she’s onto something. Next time I’m at the hardware store, I’ll try to remember to pick up some suitably sized dowel.
I did make one somewhat simpler hanging thingy, which I call “Geeko’s Tail (Uncurled)”. It’s just DVDs superglued together on the flat, hanging from fishing line, but I think it’s kinda cool:
No, it’s not an upside down question mark, it’s “Geeko’s Tail (Uncurled)”
Also, I’ve discovered that Officeworks has an e-waste recycling program, so any DVDs I don’t use in future projects needn’t go to landfill.
I have long said “Long Malaysians, Short Malaysia” in conversation to many. Maybe it took me a while to tweet it, but this was the first example: Dec 29, 2021. I’ve tweeted it a lot more since.
Malaysia has a 10th Prime Minister, but in general, it is a very precarious partnership. Consider it, same shit, different day?
5/n: Otherwise, there will be no change.
So change via “purported democracy” is never going to happen with a country like Malaysia, rotten to the core. It is a crazy dream.
I just have to get off the Malaysian news diet. Malaysians elsewhere, are generally very successful. Malaysians suffering by their daily doldrums, well, they just need to wake up, see the light, and succeed.
In the end, as much as people paraphrase, ask not what the country can do for you, legitimately, this is your life, and you should be taking good care of yourself and your loved ones. You succeed, despite of. Politics and the state happens, regardless of.
Me, personally? Ideas are abound for how to get Malaysians who see the light, to succeed elsewhere. And if I read, and get angry at something (tweet rage?), I’m going to pop RM50 into an investment account, which should help me get off this poor habit. I’ll probably also just cut subscriptions to Malaysian news things… Less exposure, is actually better for you. I can’t believe that it has taken me this long to realise this.
I did poorly blogging last year. Oops. I think to myself when I read, This Thing Still On?, I really have to do better in 2023. Maybe the catalyst is the fact that Twitter is becoming a shit show. I doubt people will leave the platform in droves, per se, but I think we are coming back to the need for decentralised blogs again.
I have 477 days to becoming 40. I ditched the Hobonich Techo sometime in 2022, and just focused on the Field Notes, and this year, I’ve got a Monocle x Leuchtturm1917 + Field Notes combo (though it seems my subscription lapsed Winter 2022, I should really burn down the existing collection, and resubscribe).
2022 was pretty amazing. Lots of work. Lots of fun. 256 days on the road (what a number), 339,551km travelled, 49 cities, 20 countries.
The getting back into doing, and not being afraid of experimenting in public is what 2023 is all about. The Year of The Rabbit is upon us tomorrow, hence why I don’t mind a little later Hello 2023 :)
Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.
I like using Catalyst Cloud to host some of my personal sites. In the past I used to use CAcert for my TLS certificates, but more recently I've been using Let's Encrypt for my TLS certificates as they're trusted in all browsers. Currently the LoadBalancer as a Service (LBaaS) in Catalyst Cloud doesn't have built in support for Let's Encrypt. I could use an apache2/nginx proxy and handle the TLS termination there and have that manage the Let's Encrypt lifecycle, but really, I'd rather use LBaaS.
So I thought I'd set about working out how to get Dehydrated (the Let's Encrypt client I've been using) to drive LBaaS (known as Octavia). I figured this would be of interest to other people using Octavia with OpenStack in general, not just Catalyst Cloud.
There's a few things you need to do. These instructions are specific to Debian:
Install and configure Dehydrated to create the certificates for the domain(s) you want.
apt install barbican
Create the LoadBalancer (use the API, ClickOps, whatever), just forward port 80 for now (see sample Apache configs below).
Save the sample hook.sh below to /etc/dehydrated/hook.sh, you'll probably need to customise it, mine is a bit more complicated!
Insert the UUID of your LoadBalancer in hook.sh where LB_LISTENER is set.
Create /etc/dehydrated/catalystcloud/password as described in hook.sh
Save OpenRC file from the Catalyst Cloud dashboard as /etc/dehydrated/catalystcloud/openrc.sh
Install jq, openssl and the openstack tools, on Debian this is:
You should be able to rename the latest certs /var/lib/dehydrated/certs/$DOMAIN and then run dehydrated -c to have it reissue and then deploy a cert.
As we're using HTTP-01 Challenge Type here, you need to have the LoadBalancer forwarding port 80 to your website to allow for the challenge response. It is good practice to have a redirect to HTTPS, here's an example virtual host for Apache:
You all also need this in /etc/apache2/conf-enabled/letsencrypt.conf:
Alias /.well-known/acme-challenge /var/lib/dehydrated/acme-challenges
<Directory /var/lib/dehydrated/acme-challenges>
Options None
AllowOverride None
# Apache 2.x
<IfModule !mod_authz_core.c>
Order allow,deny
Allow from all
</IfModule>
# Apache 2.4
<IfModule mod_authz_core.c>
Require all granted
</IfModule>
</Directory>
And that should be all that you need to do. Now, when Dehydrated updates your certificate, it should update your LoadBalancer as well!
Sample hook.sh:
deploy_cert() {
local DOMAIN="${1}" KEYFILE="${2}" CERTFILE="${3}" FULLCHAINFILE="${4}" \
CHAINFILE="${5}" TIMESTAMP="${6}"
shift 6
# File contents should be:
# export OS_PASSWORD='your password in here'
. /etc/dehydrated/catalystcloud/password
# OpenRC file from the Catalyst Cloud dashboard
. /etc/dehydrated/catalystcloud/openrc.sh --no-token
# UUID of the LoadBalancer to be managed
LB_LISTENER='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
# Barbican uses P12 files, we need to make one.
P12=$(readlink -f $KEYFILE \
| sed -E 's/privkey-([0-9]+)\.pem/barbican-\1.p12/')
openssl pkcs12 -export -inkey $KEYFILE -in $CERTFILE -certfile \
$FULLCHAINFILE -passout pass: -out $P12
# Keep track of existing certs for this domain (hopefully no more than 100)
EXISTING_URIS=$(openstack secret list --limit 100 \
-c Name -c 'Secret href' -f json \
| jq -r ".[]|select(.Name | startswith(\"$DOMAIN\"))|.\"Secret href\"")
# Upload the new cert
NOW=$(date +"%s")
openstack secret store --name $DOMAIN-$TIMESTAMP-$NOW -e base64 \
-t "application/octet-stream" --payload="$(base64 < $P12)"
NEW_URI=$(openstack secret list --name $DOMAIN-$TIMESTAMP-$NOW \
-c 'Secret href' -f value) \
|| unset NEW_URI
# Change LoadBalancer to use new cert - if the old one was the default,
# change the default. If the old one was in the SNI list, update the
# SNI list.
if [ -n "$EXISTING_URIS" ]; then
DEFAULT_CONTAINER=$(openstack loadbalancer listener show $LB_LISTENER \
-c default_tls_container_ref -f value)
for URI in $EXISTING_URIS; do
if [ "x$URI" = "x$DEFAULT_CONTAINER" ]; then
openstack loadbalancer listener set $LB_LISTENER \
--default-tls-container-ref $NEW_URI
fi
done
SNI_CONTAINERS=$(openstack loadbalancer listener show $LB_LISTENER \
-c sni_container_refs -f value | sed "s/'//g" | sed 's/^\[//' \
| sed 's/\]$//' | sed "s/,//g")
for URI in $EXISTING_URIS; do
if echo $SNI_CONTAINERS | grep -q $URI; then
SNI_CONTAINERS=$(echo $SNI_CONTAINERS | sed "s,$URI,$NEW_URI,")
openstack loadbalancer listener set $LB_LISTENER \
--sni-container-refs $SNI_CONTAINERS
fi
done
# Remove old certs
for URI in $EXISTING_URIS; do
openstack secret delete $URI
done
fi
}
HANDLER="$1"; shift
#if [[ "${HANDLER}" =~ ^(deploy_challenge|clean_challenge|sync_cert|deploy_cert|deploy_ocsp|unchanged_cert|invalid_challenge|request_failure|generate_csr|startup_hook|exit_hook)$ ]]; then
if [[ "${HANDLER}" =~ ^(deploy_cert)$ ]]; then
"$HANDLER" "$@"
fi
It’s been a little over a year since our Redflow ZCell battery and Victron Energy inverter/charger kit were installed on our existing 5.94kW solar array. Now that we’re past the Southern Hemisphere spring equinox it seems like an opportune time to review the numbers and try to see exactly how the system has performed over its first full year. For background information on what all the pieces are and what they do, see my earlier post, Go With The Flow.
As we look at the figures for the year, it’s worth keeping in mind what we’re using the battery for, and how we’re doing it. Naturally we’re using it to store PV generated electricity for later use when the sun’s not shining. We are also charging the battery from the grid at certain times so it can be drawn down if necessary during peak times, for example I set up a small overnight charge to ensure there was power for the weekday morning peak, when the sun isn’t really happening yet, but grid power is more than twice as expensive. More recently in the winter months, I experimented with keeping the battery full with scheduled charges during most non-peak times. This involved quite a bit more grid charging, but got us through a couple of three hour grid outages without a hitch during some severe weather in August.
I spent some time going through data from the VRM portal for the last year, and correlating that with current bills from Aurora energy, and then I tried to compare our last year of usage with a battery, to the previous three years of usage without a battery. For reasons that will become apparent later, this turned out to be a massive pain in the ass, so I’m going to start by looking only at what we can see in the VRM portal for the past year.
The VRM portal has three summary views: System Overview, Consumption and Solar. System Overview tells us overall how much total power was pulled from the grid, how much was exported to the grid, how much was produced locally, and how much was consumed by our loads. The Consumption view (which I wish they’d named “Loads”, because I think that would be clearer) gives us the same consumption figure, but tells us how much of that came from the grid, vs. what came from the battery vs. what came from solar. The Solar view tells us how much PV generation went to the grid, how much went to the battery, and how much was used directly. There is some overlap in the figures from these three views, but there are also some interesting discrepancies, notably: the “From Grid” and “To Grid” figures shown under System Overview are higher than what’s shown in the Consumption and Solar views. But, let’s start by looking at the Consumption and Solar views, because those tell us what the system gives us, and what we’re using. I’ll come back after that to the System Overview, which is where things start to get weird and we discover what the system costs to run.
The VRM portal lets you chose any date range you like to get historical figures and bar charts. It also gives you pie charts of the last 24 hours, 7 days, 30 days and 365 days. To make the figures and bar charts match the pie charts, the year we’re analysing starts at 4pm on September 25, 2021 and ends at 4pm on September 25, 2022, because that’s exactly when I took the following screenshots. This means we get a partial September at each end of the bar chart. I’m sorry about that.
Here’s the Consumption view:
Consumption view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00
This shows us that in the last 12 months, our loads consumed 10,849kWh of electricity. Of that, 54% (5,848kWh) came from the grid, 23% (2,506kWh) came direct from solar PV and the final 23% (2,494kWh) came from the battery.
From the rough curve of the bar chart we can see that our consumption is lower in the summer months and higher in the winter months. I can’t say for certain, but I have to assume that’s largely due to heating. The low in February was 638kWh (an average of 22.8kWh/day). The high in July was 1,118kWh (average 36kWh/day).
Now let’s look at the Solar view:
Solar view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00
In that same time period we generated 5,640kWh with our solar array, of which 44% (2,506kWh) was used directly by our loads, 43% (2,418kWh) went into the battery and 13% (716kWh) was exported to the grid.
Unsurprisingly our generation is significantly higher in summer than in winter. We got 956kWh (average 30kWh/day) in December but only 161kWh (5.3kWh/day) in June. Peak summer figures like that mean we’ll theoretically be able to do without grid power at all during that period once we get a second ZCell (note that we’re still exporting to the grid in December – that’s because we’ve got more generation capacity than storage). The winter figures clearly indicate that there’s no way we can provide anywhere near all our own power at that time of year with our current generation capacity and loads.
Now look closely at the summer months (December, January and February). There should be a nice curve evident there from December to March, but instead January and February form a weird dip. This is because we were without solar generation for three weeks from January 20 – February 11 due to replacing a faulty MPPT. Based on figures from previous years, I suspect we lost 500-600kWh of potential generation in that period.
Another interesting thing is that if we compare “To Battery” on the Solar view (2,418kWh) with “From Battery” on the Consumption view (2,494kWh), we see that our loads consumed 76kWh more from the battery than we actually put into it with solar generation. This discrepancy is due to the fact that in addition to charging the battery from solar, we’ve also been charging it from the grid at certain times, but the amount of power sent to the battery from the grid isn’t broken out explicitly anywhere in the VRM portal.
Now let’s look at the System Overview:
System Overview view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00
Here we see the same figures for “Production” (5,640kWh) and “Consumption” (10,849kWh) as were in the Consumption and Solar views, and the bar chart shows the same consumption and generation curves (ignore the blue overlay and line which indicate battery minimum/maximum and average state of charge – that information is largely meaningless at this scale, given we cycle the battery completely every day).
Now look at “To Grid” and “From Grid”. “To Grid” is 754 kWh, i.e. we somehow sent 38kWh more to the grid than came from solar. “From Grid”, at 8,531kWh, is a whopping 2,683kWh more than the 5,848kWh grid power consumed by our loads (i.e. close to half as much again).
So, what’s going on here?
One factor is that we’re charging the battery from the grid at certain times. Initially that was a few hours overnight and a few hours in the afternoon on weekdays, although the afternoon charge is obviously also provided by the solar if the sun is shining. For all of July, August and most of September though I was using a charge schedule to keep the battery full except for peak times and maintenance cycle nights, which meant quite a bit more grid charging overnight than earlier in the year, as well as grid charging most of the day during days with no or minimal sunshine. Grid power sent to the battery isn’t visible in the “From Grid” figure on the Consumption view – that view shows only our loads, i.e. the equipment the system is powering – but it is part of the “From Grid” figure in the System Overview.
Similarly, some of the power we export to the grid is actually exported from the battery, as opposed to being exported from solar generation. That usually only happens during maintenance cycles when our loads aren’t enough to draw the battery down at the desired discharge rate. But again, same thing, that figure is present here on the system overview page as part of “To Grid”, but of course is not part of the “To Grid” figure on the Solar view.
Another factor is that the system itself needs some amount of power to operate. The Victron kit (the MultiPlus II Inverter/Chargers, the Cerbo GX, the MPPT) use some small amount of power themselves. The ZCell battery also requires power to operate its pumps and fans. When the sun is out this power can of course come from solar. When solar power is not available, power to run the system needs to come from some combination of the remaining charge in the battery, and the grid.
On that note, I did a little experiment to see how much power the system uses just to operate. On July 9 (which happened to be a maintenance cycle day), I disabled all scheduled battery charges, and I shut off the DC isolators for the solar PV, so the battery would remain online (pumps and fans running) but empty for all of July 10. The following day I went and checked the figures on the System Overview, which showed we drew 35kWh, but that our consumption was 33kWh. So, together, the battery doing nothing other than running its pumps and fans, plus the Multis doing nothing other than passing grid power through, used 2kWh of power in 24 hours. Over a year, that’s 730kWh. As mentioned above, ordinarily some of that will be sourced from mains and some from solar, but if we look at the total power that came into the system as a whole (5,640kWh from solar + 8,531kWh from the grid = 14,171kWh), 730kWh is just slightly over 5% of that.
The final factor in play is that a certain amount of power is naturally lost due to conversion at various points. The ZCell has a maximum 80% DC-DC stack efficiency, meaning in the absolute best case if you want to get 10kW out of it, you have to put 12.5kW in. In reality you’ll never hit the best case: the lifetime charge and discharge figures the BMS currenly shows for our ZCell are 4,423 and 3,336kWh respectively, which is a bit over 75%. The Multis have a maximum efficiency of 96% when doing their invert/charge dance, so if we grid charge the battery, we lose at least 4% on the way in, and at least 4% on the way out as well, going to and from AC/DC. Again, in reality that loss will be higher than 4% each way, because 96% is the maximum efficiency.
A bunch of the stuff above just doesn’t apply to the previous system with the ABB inverter and no battery. I also don’t have anything like as much detailed data to go on for the old system, which makes comparing performance with the new system fiendishly difficult. The best comparison I’ve been able to come up with so far involves looking at total power input to the system (power from grid plus solar generation), total consumption by loads (i.e. actual locally usable power), and total power exported.
Prior to the Victron gear and Redflow battery installation, I had grid import and export figures from my Aurora Energy bills, and I had total generation figures from the ABB inverter. From this I can synthesise what are hopefully reasonably accurate load consumption figures by adding grid input to total PV generation minus grid export.
I had hoped to do this analysis on a quarterly basis to line up with Aurora bills, because then I would also be able to see how seasonal solar generation and usage went up and down. Unfortunately the billing for 2020 and 2021 was totally screwed up by the COVID-19 pandemic, because there were two quarters during which nobody was coming out to read the electricity meter. The bills for those quarters stated estimated usage (i.e. were wrong, especially given they estimated grid export as zero), with subsequent quarters correcting the figures. I have no way to reliably correlate that mess with my PV generation figures, except on an annual basis. Also, using billing periods from pre-battery years, the closest I can get to the September 25 based 2021-2022 year I’m looking at now is billing periods starting and ending in mid-August. But, that’s close enough. We’ve still got four pretty much back-to-back 12 month periods to look at.
Year
Grid In
Solar In
Total In
Loads
Export
2018-2019
9,031
6,682
15,713
11,827
3,886
2019-2020
9,324
6,468
15,792
12,255
3,537
2020-2021
7,582
6,347
13,929
10,358
3,571
2021-2022
8,531
5,640
14,171
10,849
754
One thing of note here is that in the 2018-2019 and 2019-2020 years, our annual consumption was pretty close to 12MWh, whereas in 2020-2021 and 2021-2022 it was closer to 10.5MWh. If I had to guess, I’d say that ~1.5MWh/year drop is due to a couple of pieces of computer equipment that were previously always on, now mostly running in standby mode except when actually needed. A couple of hundred watts constant draw is a fair whack of power over the course of a year. Another thing to note is the big drop in power exported in 2021-2022, because most of our solar generation is now used locally.
The thing that freaked me out when looking at these figures is that in the battery year, while our loads consumed 491kWh more than in the previous non-battery year, we pulled 949kWh more power in from the grid! This is the opposite of what I had expected to see, especially having previously written:
In the eight months the system has been running we’ve generated 4631kWh of electricity and “only” sent 588kWh to the grid, which means we’ve used 87% of what we generated locally – much better than the pre-battery figure of 45%. I suspect we’ve reduced the amount of power we pull from the grid by about 30% too, but I’ll have to wait until we have a full year’s worth of data to be sure.
When I wrote that, I was looking at August 31, 2021 through April 27, 2022, and comparing that to the August 2020 to May 2021 grid power figures from my old Aurora bills. The mistake I must have made back then was to look at “From Grid” on the Consumption view, rather than “From Grid” on the System Overview. I’ve just done this exercise again, and the total grid draw from our Aurora bills from August 2020 to May 2021 is 4,980kWh. “From Grid” on the Consumption view for August 2021 to May 2022 is 3,575kWh, which is about 30% less, but “From Grid” on the System Overview is 4,754kWh, which is only about 5% less. So our loads pulled about 30% less from the grid than the same time the year before, but our system as a whole didn’t.
Now let’s break our ridiculous September-based year down further into months, to see if we can see more detail. I’ve highlighted some interesting periods in bold.
Month
Grid In
Solar In
Total In
Loads
Export
Sep 21 (part)
153
101
254
213
6
Oct 21
636
629
1,265
988
55
Nov 21
430
747
1,177
866
97
Dec 21
232
956
1,188
767
176
Jan 22
652
450
1,102
822
74
Feb 22
470
430
900
638
83
Mar 22
498
568
1,066
813
64
Apr 22
609
377
986
775
27
May 22
910
238
1,148
953
3
Jun 22
1,114
161
1,275
1073
2
Jul 22
1,163
223
1,386
1118
11
Aug 22
910
375
1,285
966
64
Sep 22 (part)
754
385
1,139
857
92
Total
8,531
5,640
14,171
10,849
754
December is great. We generated about 25% more power than our loads use (956/767=1.25), and our grid input was only about 30% of the total of our loads (232/767=0.30).
January and February show the effects of missing three weeks of potential generation. I mean, just look at December through February 2021-2022 versus the previous three summers.
PV Generation December through January 2018-2022
2018-2019
2019-2020
2020-2021
2021-2022
December
919
882
767
956
January
936
797
818
450
February
699
656
711
430
June and July are terrible. They’re our highest load months, with the lowest solar generation and we pulled 3-4% more power from the grid than our loads actually consumed. I’m going to attribute the latter largely to grid charging the battery.
If I dig a couple of interesting figures out for June and July I see “To Battery” on the Solar view shows 205kWh, and “From Battery” on the Consumption view shows 558kWh. Total consumption in that period was 2,191kWh, with the total “From Grid” reported in System Overview of 2,277kWh. Let’s mess with that a bit.
Bearing in mind the efficiency numbers mentioned earlier, if 205kWh went to the battery from PV, that means no more than 154kWh of what we got out of the battery was from PV generation (remember: real world DC-DC stack efficiency of about 75%). The remaining 404kWh out of the battery is power that went into it from the grid. And that means at least 538kWh in (404/0.75). Note that total from grid for these two months was 86kWh more than the 2,191kWh used by our loads. If I hadn’t been keeping the battery topped up from the grid, I’d’ve saved at least 134kWh of grid power, which would have brought our grid input figure back down below our consumption figure. Note also that this number will actually be higher in reality because I haven’t factored in AC/DC conversion losses from the Multis.
Now let’s look at some costs. When I started trying to compare the new system to the previous system, I went in thinking to look at in in terms of total power input to the system, total consumption by loads, and total power exported. There’s one piece missing there, so let’s add another couple of columns to an earlier table:
Year
Grid In
Solar In
Total In
Loads
Export
Total Out
what?
2021-2022
8,531
5,640
14,171
10,849
754
11,603
2,568
The total usable output of the system was 11,603kWh for 14,171kWh input. The difference between these two figures – 2,568kWh, or about 18% – went somewhere else. Per my earlier experiment, 5% is power that went to actually operate the system components, including the battery. That means about 13% of the power input to the system over the course of the year must have gone to some combination of charge/discharge and AC/DC conversion (in)efficiencies. We can consider this the energy cost of the system. To have the ability to time-shift expensive peak grid electricity, and to run the house without the grid if the sun is out, or from the battery when it has charge, costs us 18% of the total available energy input.
Finally, speaking of expensive grid electricity, let’s look at how much we paid Aurora Energy over the past four years for our power. The bills are broken out into different tariffs, for which you’re charged different amounts per kilowatt hour and then there’s an additional daily supply charge, and also credits for power exported. We can simplify that by just taking the total dollar value of all the power bills and dividing that by the total power drawn from the grid to arrive at an effective cost per kilowatt hour for the entire year. Here it is:
Year
From Grid
Total Bill
Cost/kWh
2018-2019
9,031
$2,278.33
$0.25
2019-2020
9,324
$2,384.79
$0.26
2020-2021
7,582
$1,921.77
$0.25
2021-2022
8,531
$1,731.40
$0.20
So, the combination of the battery plus the switch from Flat Rate to Peak & Off-Peak billing has reduced the cost of our grid power by about 20%. I call that a win.
Going forwards it will be interesting to see how the next twelve months go, and, in particular, what we can do to reduce our power consumption. A significant portion of our power is used by a bunch of always-on computer equipment. Some of that I need for my work, and some of that provides internet access, file storage and email for us personally. Altogether, according to the UPSes, this kit pulls 200-250 watts continuously, but will pull more than that during the day when it’s being used interactively. If we call it 250W continuous, that’s a minimum of 6kWh/day, which is 2,190kWh/year, or about 20% of the 2021-2022 consumption. Some of that equipment should be replaced with newer, more power efficient kit. Some of it could possibly even be turned off or put into standby mode some of the time.
We still need to get a heat pump to replace the 2400W panel heater in our bedroom. That should save a huge amount of power in winter. We’re also slowly working our way through the house installing excellent double glazed windows from Elite Double Glazing, which will save on power for heating and cooling year round.
And of course, we still need to get that second ZCell.
My team at SUSE is working on a new S3-compatible storage solution for Kubernetes, based on Ceph’s RADOS Gateway (RGW), except without any of the RADOS bits. The idea is that you can deploy our s3gw container on top of Longhorn (which provides the underlying replicated storage), and all this is running in your Kubernetes cluster, along with your applications which thus have convenient access to a local S3-compatible object store.
We’ve done this by adding a new storage backend to RGW. The approach we’ve taken is to use SQLite for metadata, with object data stored as files in a regular filesystem. This works quite neatly in a Kubernetes cluster with Longhorn, because Longhorn can provide a persistent volume (think: an ext4 filesystem), on which s3gw can store its SQLite database and object data files. If you’d like to kick the tyres, check out Giuseppe’s deployment tutorial for the 0.2.0 release, but bear in mind that as I’m writing this we’re all the way up to 0.4.0 so some details may have changed.
While s3gw on Longhorn on Kubernetes remains our primary focus for this project, the fact that this thing only needs a filesystem for backing storage means it can be run on top of just about anything. Given “just about anything” includes an old school two node Pacemaker cluster with DRBD for replicated storage, why not give that a try? I kinda like the idea of a good solid highly available S3-compatible storage solution that you could shove into the bottom of a rack somewhere without too much difficulty.
It’s probably eight years since I last deployed Pacemaker and DRBD, so to refresh my memory I ran with SUSE’s latest Highly Available NFS Storage with DRBD and Pacemaker document, but skipped all the NFS bits. That gives a filesystem mounted on one node, which will fail over to the other node if something breaks. On top of that, we need to run the s3gw container, the s3gw-ui container, an nginx HTTPS reverse proxy to smoosh those two together, and a virtual/floating IP, so the whole lot is accessible to the outside world.
Here’s the interesting parts of my Pacemaker configuration:
# crm configure show
[...]
primitive drbd_s3 ocf:linbit:drbd \
params drbd_resource=s3 drbdconf="/etc/drbd.conf" \
op monitor interval=29s role=Master \
op monitor interval=31s role=Slave
primitive fs_s3 Filesystem \
params device="/dev/drbd0" directory="/data" fstype=ext4 \
meta target-role=Started \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
primitive https nginx \
op start timeout=40s interval=0 \
op stop timeout=60s interval=0 \
op monitor timeout=30s interval=10s \
op monitor timeout=30s interval=30s \
op monitor timeout=60s interval=20s
primitive s3-ip IPaddr2 \
params ip=192.168.100.50 \
op monitor interval=10 timeout=20
primitive s3gw podman \
params image="ghcr.io/aquarist-labs/s3gw:latest" run_opts="-p 7480:7480 -v/data:/data" \
op start interval=0 timeout=90s \
op stop interval=0 timeout=90s \
op monitor interval=30s timeout=30s
primitive s3gw-ui podman \
params image="ghcr.io/aquarist-labs/s3gw-ui:latest" run_opts="-p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test" \
op start interval=0 timeout=90s \
op stop interval=0 timeout=90s \
op monitor interval=30s timeout=30s
group g-s3 fs_s3 s3gw s3gw-ui https s3-ip
ms ms-drbd_s3 drbd_s3 \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
colocation col-s3_on_drbd inf: g-s3 ms-drbd_s3:Promoted
order o-drbd_before_fs Mandatory: ms-drbd_s3:promote g-s3:start
[...]
The g-s3 group ensures that the ext4 filesystem (fs_s3), s3gw container (s3gw), s3gw-ui container (s3gw-ui), nginx instance (https) and virtual IP (s3-ip) all run on the same node, and start one after another. The colocation and ordering constraints ensure that g-s3 runs on whichever node is currently the DRBD (ms-drbd_s3) primary.
The important pieces of glue here are:
The fs_s3 resource mounts /dev/drbd0 on /data
The s3gw resource passes -p 7480:7480 -v/data:/data to podman, so the container can write to /data on the host, and the S3 service is accessible via HTTP on port 7480.
The s3gw-ui resource passes -p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test to podman, so the UI is accessible via HTTP on port 8080, and it expects the S3 service to be externally available via https://s3gw.sleha.test.
nginx is configured to reverse proxy https://s3gw.sleha.test to http://localhost:7480, and https://s3gw-ui.sleha.test to http://localhost:8080.
I’ve got an entry in /etc/hosts to point s3gw.sleha.test and s3gw-ui.sleha.test at the virtual IP (192.168.100.50).
I’m using self-signed certificates (openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.pem) for s3gw and s3gw-ui, so I had to go visit both https://s3gw.sleha.test and https://s3gw-ui.sleha.test in my browser and accept the SSL certificate before the UI would work.
The DRBD config, nginx config and SSL certificates and keys need to be present on all nodes. I used csync2 for this.
Here’s my /etc/nginx/nginx.conf. I’m not entirely convinced I’ve got everything 100% right here, but it seems to work (this is, incredibly, my first time doing anything with nginx, and my first time dealing with CORS):
Monitoring of processes inside containers is a bit sketchy. By default, it will run podman exec $CONTAINER /bin/true, but that really only proves that the container is alive. You can override that command with something else, but it’s apparently better to engineer your container to die quickly and well if something goes wrong.
So what was the end result? TL;DR: It pretty much All Just WorkedTM, which is exactly what you’d hope for when running a new application on a mature HA stack. I can use s3cmd to mess around with the S3 service, and use my web browser to play with the UI. Failover is nice and quick (think: a few seconds) if I kill a node. For the sake of convenience I did this experiment on a couple of VMs using the external/libvirt STONITH plugin, but I don’t expect a real deployment to be hugely different in behaviour. Also, I’d forgotten how good Pacemaker is at highlighting poorly behaved applications – prior to this experiment the s3gw-ui container didn’t stop well, but we weren’t aware of that until I tried a manual failover which took too long and resulted in an unexpected STONITH due to a stop timeout. Moritz has since fixed that.
One thing I tripped over when doing this deployment was the correct values to use for the access_key and secret_key of the default user when talking to the S3 service. These are actually settable for the s3gw container via the RGW_DEFAULT_USER_ACCESS_KEY and RGW_DEFAULT_USER_SECRET_KEY environment variables, but if left unset, they default to “test” and “test” respectively. The interesting bits of my s3cmd.cfg are thus:
access_key = test
secret_key = test
host_base = https://s3gw.sleha.test/
host_bucket = htts://s3gw.sleha.test/%(bucket)
In retrospect I probably should have added -e RGW_DEFAULT_USER_ACCESS_KEY=tserong -e RGW_DEFAULT_USER_SECRET_KEY=do_not_tell_anyone_this_is_your_password to the run_opts parameter of the s3gw resource in the Pacemaker config.
What is HCX? VMware HCX is an application mobility platform designed for simplifying application migration, workload rebalancing and business continuity across datacenters and clouds. VMware HCX was formerly known as Hybrid Cloud Extension and NSX Hybrid Connect.
GCVE HCX GCVE deploys the Enterprise version of HCX as part of the cost of the solution.
HCX Enterprise has the following benefits:
Hybrid Interconnect WAN Optimisation Bulk Migration, Live Migration and HCX Replication Assisted vMotion Cloud to cloud migration Disaster Protection KVM & Hyper-V to vSphere migrations Traffic Engineering Mobility Groups Mobility Optimised Networking Changeover scheduling Definitions Cold Migration
We have seen a lot of Google Cloud VMware Engine over the last few months and for the entire time we have used click-ops to provision new infrastructure, networks and VM’s. Now we are going to the next level and we will be using Terraform to manage our infrastructure as code so that it is version controlled and predictable.
Installing Terraform The first part of getting this working is installing Terraform on your local machine.
Picking up where we left off last month, let’s dive into disaster recovery and how to use Site Recovery Manager and Google Backup & Protect to DR into and within the cloud with GCVE.
But before we do, a quick advertisement:
If you are in Brisbane, Australia, I suggest coming to the awesome Google Infrastructure Group (GIG) which focuses on GCVE where on 04 July 2022 I will be presenting on Terraform in GCVE.
Let’s pick up where we left off from last months article and start setting up some of the features of GCVE, starting with Advanced Autoscaling.
What is Advanced Auto-Scaling? Advanced Autoscaling automatically expands or shrinks a private cloud based on CPU, memory and storage utilisation metrics.
GCVE monitors the cluster based on the metrics defined in the autoscale policy and decides to add or remove nodes automatically. Remember: GCVE is physical Dell Poweredge servers, not a container/VM running in Docker or on a hypervisor like VMware.
We’ve done this a number of times over the last decade, from OSDC to LCA. The idea is to provide a free psychologist or counsellor at an in-person conference. Attendees can do an anonymous booking by taking a stickynote (with the timeslot) from a signup sheet, and thus get a free appointment.
Many people find it difficult taking the first (very important) step towards getting professional help, and we’ve received good feedback that this approach indeed assists.
So far we’ve always focused on open source conferences. Now we’re moving into information security! First BrisSEC 2022 (Friday 29 April at the Hilton in Brisbane, QLD) and then AusCERT 2022 (10-13 May at the Star Hotel, Gold Coast QLD). The awesome and geek friendly Dr Carla Rogers will be at both events.
How does this get funded? Well, we’ve crowdfunded some, nudged sponsors, most mostly it gets picked up by the conference organisers (aka indirectly by the sponsors, mostly).
If you’re a conference organiser, or would like a particular upcoming conference to offer this service, do drop us a line and we’re happy to chase it up for you and help the organisers to make it happen. We know how to run that now.
In-person is best. But for virtual conferences, sure contact us as well.
The hack day didn’t go as well as I hoped, but didn’t go too badly. There was smaller attendance than hoped and the discussion was mostly about things other than FLOSS. But everyone who attended had fun and learned interesting things so generally I think it counts as a success. There was discussion on topics including military hardware, viruses (particularly Covid), rocketry, and literature. During the discussion one error in a Wikipedia page was discussed and hopefully we can get that fixed.
I think that everyone who attended will be interested in more such meetings. Overall I think this is a reasonable start to the Hack Day meetings, when I previously ran such meetings they often ended up being more social events than serious hacking events and that’s OK too.
One conclusion that we came to regarding meetings is that they should always be well announced in email and that the iCal file isn’t useful for everyone. Discussion continues on the best methods of announcing meetings but I anticipate that better email will get more attendance.
What is GCVE? Google Cloud VMware Engine, or GCVE, is a fully managed VMware hypervisor and associated management and networking components, (vSphere, NSX-T, vSAN and HCX) built on top of Google’s highly performant and scalable infrastructure with fully redundant and dedicated 100Gbps networking that provides 99.99% availability.
The solution is integrated into Google Cloud Platform, so businesses benefit from having full access to GCP services, native VPC networking, Cloud VPN or Interconnect as well as all the normal security features you expect from GCP.
The March 2022 meeting went reasonably well. Everyone seemed to have fun and learn useful things about computers. After 2 hours my Internet connection dropped out which stopped the people who were using VMs from doing the tutorial. Fortunately most people seemed ready for a break so we ended the meeting. The early and abrupt ending of the meeting was a disappointment but it wasn’t too bad, the meeting would probably only have gone for another half hour otherwise.
The BigBlueButton system was shown to be effective for training when one person got confused with the Debian package configuration options for Postfix and they were able to share the window with everyone else to get advice. I was also confused by that stage.
Future Meetings
The main feature of the meeting was training in setting up a mailserver with Postfix, here are the lecture notes for it [1]. The consensus at the end of the meeting was that people wanted more of that for the April meeting. So for the April meeting I will add to the Postfix Training to include SpamAssassin, SPF, DKIM, and DMARC. For the start of the next meeting instead of providing bare Debian installations for the VMs I’ll provide a basic Postfix/Dovecot setup so people can get straight into SpamAssassin etc.
For the May meeting training on SE Linux was requested.
Social Media
Towards the end of the meeting we discussed Matrix and federated social media. LUV has a Matrix server and I can give accounts to anyone who’s involved in FOSS in the Australia and New Zealand area. For Mastodon the NZOSS Mastodon server [2] seems like a good option. I have an account there to try Mastodon, my Mastodon address is @etbe@mastodon.nzoss.nz .
We are going to make Matrix a primary communication method for the Flounder group, the room is #flounder:luv.asn.au . My Matrix address is @etbe:luv.asn.au .
We also have a new URL for the blog and events. See the right sidebar for the link to the iCal file which can be connected to Google Calendar and most online calendaring systems.
We just had the first Flounder meeting which went well. Had some interesting discussion of storage technology, I learnt a few new things. Some people did the ZFS training and BTRFS training and we had lots of interesting discussion.
Andrew Pam gave a summary of new things in Linux and talked about the sites lwn.net, gamingonlinux.com, and cnx-software.com that he uses to find Linux news. One thing he talked about is the latest developments with SteamDeck which is driving Linux support in Steam games. The site protondb.com tracks Linux support in Steam games.
We had some discussion of BPF, for an introduction to that technology see the BPF lecture from LCA 2022.
Next Meeting
The next meeting (Saturday 5th of March 1PM Melbourne time) will focus on running your own mail server which is always of interest to people who are interested in system administration and which is probably of more interest than usual because of Google forcing companies with “a legacy G Suite subscription” to transition to a more expensive “Business family” offering.
I “recently” wrote about obtaining a new (to me, actually quite old) computer over in The Apple Power Macintosh 7200/120 PC Compatible (Part 1). This post is a bit of a detour, but may help others understand why some images they download from the internet don’t work.
Disk partitioning is (of course) a way to divide up a single disk into multiple volumes (partitions) for different uses. While the idea is similar, computer platforms over the ages have done this in a variety of different ways, with varying formats on disk, and varying limitations. The ones that you’re most likely to be familiar with are the MBR partitioning scheme (from the IBM PC), and the GPT partitioning scheme (common for UEFI systems such as the modern PC and Mac). One you’re less likely to be familiar with is the Apple Partition Map scheme.
The way all IBM PCs and compatibles worked from the introduction of MS-DOS 2.0 in 1983 until some time after 2005 was the Master Boot Record partitioning scheme. It was outrageously simple: of the first 512 byte sector of a disk, the first 446 bytes was for the bootstrapping code (the “boot sector”), the last 2 bytes were for the magic two bytes telling the BIOS this disk was bootable, and the other 64 bytes were four entries of 16 bytes, each describing a disk partition. The Wikipedia page is a good overview of what it all looks like. Since “four partitions should be enough for anybody” wasn’t going to last, DOS 3.2 introduced “extended partitions” which was just using one of those 4 partitions as another similar data structure that could point to more partitions.
In the 1980s (similar to today), the Macintosh was, of course, different. The Apple Partition Map is significantly more flexible than the MBR on PCs. For a start, you could have more than four partitions! You could actually have a lot more than four partitions, as the Apple Partition Map is a single 512-byte sector for each partition, and the partition map is itself a partition. Instead of being block 0 (like the MBR is), it actually starts at block 1, and is contiguous (The Driver Descriptor Record is what’s at block 0). So, once created, it’s hard to extend. Typically it’d be created as 64×512-byte entries, for 32kb… which turns out is actually about enough for anyone.
The Inside Macintosh reference on the SCSI Manager goes through more detail as to these structures. If you’re wondering what language all the coding examples are in, it’s Pascal – which was fairly popular for writing Macintosh applications in back in the day.
But the actual partition map isn’t the “interesting” part of all this (and yes, the quotation marks are significant here), because Macs are pretty darn finicky about what disks to boot off, which gets to be interesting if you’re trying to find a CD-ROM image on the internet from which to boot, and then use to install an Operating System from.
I never programmed a 1980s Macintosh actually in the 1980s. It was sometime in the early 1990s that I first experienced Microsoft Basic for the Macintosh. I’d previously (unknowingly at the time as it was branded Commodore) experienced Microsoft BASIC on the Commodore 16, Commodore 64, and even the Apple ][, but the Macintosh version was something else. It let you do some pretty neat things such as construct a GUI with largely the same amount of effort as it took to construct a Text based UI on the micros I was familiar with.
Okay, to be fair, I’d also dabbled in Microsoft QBasic that came bundled with MS-DOS of the era, which let you do a whole bunch of graphics – so you could theoretically construct a GUI with it. Something I did attempt to do. Programming on the Mac was so much easier to construct a GUI.
Of course, Microsoft Basic wasn’t the preferred way to program on the Macintosh. At that time it was largely Pascal, with C being something that also existed – but you were going to see Pascal in Inside Macintosh. It was probably somewhat fortuitous that I’d poked at Pascal a bit as something alternate to look at in the high school computing classes. I can only remember using TurboPascal on DOS systems and never actually writing Pascal on the Macintosh.
By the middle part of the 1990s though, I was firmly incompetently writing C on the Mac. No doubt the quality of my code increased after I’d done some university courses actually covering the language rather than the only practical way I had to attempt to write anything useful being looking at Inside Macintosh examples in Pascal and “C for Dummies” which was very not-Macintosh. Writing C on UNIX/Linux was a lot easier – everything was made for it, including Actual Documentation!
Anyway, in the early 2000s I ran MacOS X for a bit on my white iBook G3, and did a (very) small amount of any GUI / Project Builder (the precursor to Xcode) related development – instead largely focusing on command line / X11 things. The latest coolness being to use Objective-C to program applications (unless you were bringing over your Classic MacOS Carbon based application, then you could still write C). Enter some (incompetent) Objective-C coding!
Then Apple went to x86, so the hardware ceased being interesting, and I had no reason to poke at it even as a side effect of having hardware that could run the software stack. Enter a long-ass time of Debian, Ubuntu, and Fedora on laptops.
Come 2022 though, and (for reasons I should really write up), I’m poking at a Mac again and it’s now Swift as the preferred way to write apps. So, I’m (incompetently) hacking away at Swift code. I have to admit, it’s pretty nice. I’ve managed to be somewhat productive in a relative short amount of time, and all the affordances in the language gear towards the kind of safety that is a PITA when coding in C.
So this is my WIP utility to be able to import photos from a Shotwell database into the macOS Photos app:
There’s a lot of rough edges and unknowns left, including how to actually do the import (it looks like there’s going to be Swift code doing AppleScript things as the PhotoKit API is inadequate). But hey, some incompetent hacking in not too much time has a kind-of photo browser thing going on that feels pretty snappy.
Recently I read Michael Snoyman’s post on combining Axum, Hyper, Tonic and Tower. While his solution worked, it irked me – it seemed like there should be a much tighter solution possible.
I can deep dive into the code in a later post perhaps, but I think there are four points of difference. One, since the post was written Axum has started boxing its routes : so the enum dispatch approach taken, which delivers low overheads actually has no benefits today.
Two, while writing out the entire type by hand has some benefits, async code is much more pithy.
Thirdly, the code in the post is entirely generic, except the routing function itself.
And fourth, the outer Service<AddrStream> is an unnecessary layer to abstract over: given the similar constraints – the inner Service must take Request<..>, it is possible to just not use a couple of helpers and instead work directly with Service<Request...>.
So, onto a pithier version.
First, the app server code itself.
use std::{convert::Infallible, net::SocketAddr};
use axum::routing::get;
use hyper::{server::conn::AddrStream, service::make_service_fn};
use hyper::{Body, Request};
use tonic::async_trait;
use demo::echo_server::{Echo, EchoServer};
use demo::{EchoReply, EchoRequest};
struct MyEcho;
#[async_trait]
impl Echo for MyEcho {
async fn echo(
&self,
request: tonic::Request<EchoRequest>,
) -> Result<tonic::Response<EchoReply>, tonic::Status> {
Ok(tonic::Response::new(EchoReply {
message: format!("Echoing back: {}", request.get_ref().message),
}))
}
}
#[tokio::main]
async fn main() {
let addr = SocketAddr::from(([0, 0, 0, 0], 3000));
let axum_service = axum::Router::new().route("/", get(|| async { "Hello world!" }));
let grpc_service = tonic::transport::Server::builder()
.add_service(EchoServer::new(MyEcho))
.into_service();
let both_service =
demo_router::Router::new(axum_service, grpc_service, |req: &Request<Body>| {
Ok::<bool, Infallible>(
req.headers().get("content-type").map(|x| x.as_bytes())
== Some(b"application/grpc"),
)
});
let make_service = make_service_fn(move |_conn: &AddrStream| {
let both_service = both_service.clone();
async { Ok::<_, Infallible>(both_service) }
});
let server = hyper::Server::bind(&addr).serve(make_service);
if let Err(e) = server.await {
eprintln!("server error: {}", e);
}
}
Note the Router: it takes the two services and Fn to determine which to use on any given request. Then we just drop that composed service into make_service_fn and we’re done.
Next up we have the Router implementation. This is generic across any two Service<Request<...>> types as long as they are both Into<Bytes> for their Data, and Into<Box<dyn Error>> for errors.
Interesting things here – I use boxed_unsync to abstract over the body concrete type, and I implement the future using async code rather than as a separate struct. It becomes much smaller even after a few bits of extra type constraining.
One thing that flummoxed me for a little was the need to capture the future for the underlying response outside of the async block. Failing to do so provokes a 'static requirement which was tricky to debug. Fortunately there is a bug on making this easier to diagnose in rustc already. The underlying problem is that if you create the async block, and then dereference self, the type for impl of .first has to live an arbitrary time. Whereas by capturing the future immediately, only the impl of the future has to live an arbitrary time, and that doesn’t then require changing the signature of the function.
This is almost worth turning into a crate – I couldn’t see an existing one when I looked, though it does end up rather small – < 100 lines. What do you all think?
The first meeting will start at 1PM Australian Eastern time (Melbourne/Sydney) which is +1100 on Saturday the 5th of February.
I will start the video chat an hour early in case someone makes a timezone mistake and gets there an hour before it starts. If anyone else joins early we will have random chat until the start time (deliberately avoiding topics worthy of the main meeting). The link http://b.coker.com.au will redirect to the meeting URL on the day.
The first scheduled talk is a summary and discussion of free software related news. Anyone who knows of something new that excites them is welcome to speak about it.
The main event is discussion of storage technology and hands-on training on BTRFS and ZFS for those who are interested. Here are the ZFS training notes and here are the BTRFS training notes. Feel free to do the training exercises on your own VM before the meeting if you wish.
Then discussion of the future of the group and the use of FOSS social media. While social media is never going to be compulsory some people will want to use it to communicate and we could run some servers for software that is considered good (lots of server capacity is available).
Finally we have to plan future meetings and decide on which communication methods are desired.
The BBB instance to be used for the video conference is sponsored by NZOSS and Catalyst Cloud.
Since PM Scott Morrison did not announce the federal election date last week, it will now be held somewhere between March and May (see the post from ABC’s Antony Green for details). Various aspects of elections are covered in the Civics & Citizenship Australian Curriculum in Years 4, 5 and 6. Students are interested in […]
The main aim is to provide educational benefits to free software users via an online meeting that can’t be obtained by watching YouTube videos etc in a scope that is larger than one country. When the pandemic ends we will keep running this as there are benefits to be obtained from a meeting of a wide geographic scope that can’t be obtained by meetings in a single city. People from other countries are welcome to attend but they aren’t the focus of the meeting.
Until we get a better DNS name the address http://b.coker.com.au will redirect to the BBB instance used for online meetings (the meeting address isn’t yet setup so it redirects to the blog). The aim is that there will always be a short URL for the meeting so anyone who has one device lose contact can quickly type the URL into their backup device.
The first meeting will be on the 5th of Feb 2022 at 1PM Melbourne time +1100. When we get a proper domain I’ll publish a URL for an iCal file with entries for all meetings. I will also find some suitable way for meeting times to be localised (I’m sure there’s a WordPress plugin for that).
For the hands-on part of the meetings there will be virtual machine images you can download to run on your own system (tested with KVM, should work with other VM systems) and the possibility of logging in to a running VM. The demonstration VMs will have public IPv6 addresses and will also be available through different ports on a single IPv4 address, having IPv6 on your workstation will be convenient for you but you can survive without it.
Linux Australia has a list of LUGs in Australia, is there a similar list for NZ? One thing I’d like to see is a list of links for iCal files for all the meetings and also an iCal aggregator that for all iCal feeds of online meetings. I’ll host it myself if necessary, but it’s probably best to do it via Linux Australia (Linux Australasia?) if possible.
I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.
I found something boneheaded doing that today.
There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.
In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.
To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:
Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.
Neither of those should be happening.
Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.
Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.
OpenHMD Rift CV1 tracking timing
This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!
That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.
This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.
There are 3 threads involved:
USB service thread, reading video frame packets and assembling pixels in framebuffers
Fast analysis thread, that checks tracking lock is still acquired
Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs
These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:
while driver_running:
Pop a frame from the queue
Process the frame
Sleep for new frame notification
The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!
The fix is simple:
while driver_running:
Pop a frame
Process the frame
if queue_is_empty():
sleep for new frame notification
Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.
Pose latency and inter-pose spacing after fix
This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.
What does this mean in practice?
The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.
Before: Left Controller pose delays by sensorAfter: Left Controller pose delays by sensor
Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.
You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.
The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):
Before: Left Controller posesAfter: Left Controller poses
Yes, the blog is still on. January 2004 I moved to WordPress, and it is still here January 2022. I didn’t write much last year (neither here, not experimenting with the Hey blog). I didn’t post anything to Instagram last year either from what I can tell, just a lot of stories.
August 16 2021, I realised I was 1,000 days till May 12 2024, which is when I become 40. As of today, that leads 850 days. Did I squander the last 150 days? I’m back to writing almost daily in the Hobonichi Techo (I think last year and the year before were mostly washouts; I barely scribbled anything offline).
I got a new Apple Watch Series 7 yesterday. I can say I used the Series 4 well (79% battery life), purchased in the UK when I broke my Series 0 in Edinburgh airport.
TripIt stats for last year claimed 95 days on the road. This is of course, a massive joke, but I’m glad I did get to visit London, Lisbon, New York, San Francisco, Los Angeles without issue. I spent a lot of time in Kuantan, a bunch of Langkawi trips, and also, I stayed for many months at the Grand Hyatt Kuala Lumpur during the May lockdowns (I practically stayed there all lockdown).
With 850 days to go till I’m 40, I have plenty I would like to achieve. I think I’ll write a lot more here. And elsewhere. Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.
Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!
In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.
Outlier Detection
Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.
It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.
Plot of headset tracking – orientation and position
The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.
Left Touch controller tracking – orientation and position
This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).
Improved Pose Priors
Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.
This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.
LED visibility model
To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.
Better Smoothing
I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.
Improved Display Distortion Correction
I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.
Persistent Config Cache
Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.
Persistent Room Configuration
To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.
Video Capture + Processing
The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.
Session Simulator
I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.
To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):
The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.
I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.
One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!
As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.
Thank you to the kind people signed up as Github Sponsors for this project!
For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").
It's time we pushed back, and wrote for our readers, not for corporations.
It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".
So from here on in I'll be using the following:
Face Book. Formerly, "Facebook®".
Junos. Formerly JUNOS®.
ram. Formerly RAM.
Pan OS. Formerly PAN-OS®.
Unix. Formerly UNIX®.
I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.
I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:
Simple @property implementation
This shows a demo of @property-style getters. Setters are left as an exercise :)
defdemo_property(f):f.is_a_property=TruereturnfclassHasProperties:def__getattribute__(self,name):ret=super().__getattribute__(name)ifhasattr(ret,"is_a_property"):returnret()else:returnretclassDemo(HasProperties):@demo_propertydefis_a_property(self):return"I'm a property"defis_a_function(self):return"I'm a function"a=Demo()print(a.is_a_function())print(a.is_a_property)
@run (The Scoped Block)
@run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.
A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.
The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:
Adding velocity and acceleration API to OpenHMD.
Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.
Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.
The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.
Ongoing work – record and replay
My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.
Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.
Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.
I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.
To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.
Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.
GKE in Production - Part 2 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. In this episode, we are covering how to setup a nginx ingress controller to handle incoming requests.
Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.
I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.
We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.
The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.
I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.
And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.
Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.
From: Mark Shuttleworth <xxx@xxx> To: Robert Collins <xxx@xxx> Date: Thu, 15 Jan 2004, 04:30
Tom Lord gave me your email address, I believe he’s already sent you the email that I sent him so I’m sure you have some background.
In short, I am going to fund some open source development for a year. This is part of a new project that I will be getting off the ground in the coming weeks. I don’t know where it will lead, it’s flying in the face of a stiff breeze but I think at the end of the day it will at least fund a few very good open source developers for a full year to work on the projects they like most.
One of the pieces of the puzzle is high end source code management. I’ll be looking to build an infrastructure that will manage source code for between 100 and 8000 open source projects (yes, there’s a big difference between the two, I don’t know at which end of the spectrum we will be at the end of the year but our infrastructure will have to at least be capable of scaling to the latter within two years) with upwards of 2000 developers, drawing code from a variety of sources, playing with it and spitting it out regularly in nice packages.
Arch and Subversion seem to be the two leading contenders for “next generation open source sccm”. I’d be interested in your thoughts on the two of them, and how they stack up. I’m looking to hire one person who will lead that part of the effort. They’ll work alone from home, and be responsible for two things. First, extending the tool (arch or svn) in ways that help the project. Such extensions will be released under an open source licence, and hopefully embraced by the tools maintainers and included in the mainline code for the tool. And second, they will be responsible for our large-scale implementation of SCCM, using that tool, and building the management scripts and other infrastructure to support such a large, and hopefully highly automated, set of repositories.
Would you be interested in this position? What attributes and experience do you think would make you a great person to have on the team? What would your salary expectation be, as a monthly figure, for a one year contract full time?
I’m currently on your continent, well, just off it. On Lizard Island, up North. Am headed today for Brisbane, then on the 17th to Launceston via Melbourne. If you happen to be on any of those stops, would you be interested in meeting up to discuss it further?
If you’re curious you can find out a bit more about me at www.markshuttleworth.com. This project is much lower key than some of what you’ll find there. It’s a very long shot indeed. But if at worst all that happens is a bunch of open source work gets funded at my expense I’ll feel it was money well spent.
Cheers, Mark
===== — “Good judgement comes from experience, and often experience comes from bad judgement” – Rita Mae Brown
I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.
Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.
About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.
This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].
Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:
He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.
Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.
Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.
Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.
We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.
As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.
GKE in Production - Part 1 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job.
Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.
What we will build In this first tutorial, we will be building a standard GKE cluster on Google Cloud Platform and deploying the hello world container to confirm everything is working.
Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.
Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.
I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.
You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.
How to use this
First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.
Adding the configuration
In HomeAssistant, add the following line to your configuration.yaml:
sensor:!includepurpleair.yaml
and then add the following contents to purpleair.yaml
-platform:restname:'PurpleAir'# Substitute in the URL of the sensor you care about. To find the URL, go# to purpleair.com/map, find your sensor, click on it, click on "Get This# Widget" then click on "JSON".resource:https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}# Only query once a minute to avoid rate limits:scan_interval:60# Set this sensor to be the AQI value.## Code translated from JavaScript found at:# https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#value_template:>{{ value_json["results"][0]["Label"] }}unit_of_measurement:""# The value of the sensor can't be longer than 255 characters, but the# attributes can. Store away all the data for use by the templates below.json_attributes:-results-platform:templatesensors:purpleair_aqi:unique_id:'purpleair_SENSORID_aqi_pm25'friendly_name:'PurpleAirPM2.5AQI'value_template:>{% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}{{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}{%- endmacro %}{% if (states('sensor.purpleair_pm25')|float) > 1000 %}invalid{% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}{% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}{% else %}invalid{% endif %}unit_of_measurement:"bit"purpleair_description:unique_id:'purpleair_SENSORID_description'friendly_name:'PurpleAirAQIDescription'value_template:>{% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}Hazardous{% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}Hazardous{% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}Very Unhealthy{% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}Unhealthy{% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}Unhealthy for Sensitive Groups{% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}Moderate{% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}Good{% else %}undefined{% endif %}entity_id:sensor.purpleairpurpleair_pm25:unique_id:'purpleair_SENSORID_pm25'friendly_name:'PurpleAirPM2.5'value_template:"{{state_attr('sensor.purpleair','results')[0]['PM2_5Value']}}"unit_of_measurement:"μg/m3"entity_id:sensor.purpleairpurpleair_temp:unique_id:'purpleair_SENSORID_temperature'friendly_name:'PurpleAirTemperature'value_template:"{{state_attr('sensor.purpleair','results')[0]['temp_f']}}"unit_of_measurement:"°F"entity_id:sensor.purpleairpurpleair_humidity:unique_id:'purpleair_SENSORID_humidity'friendly_name:'PurpleAirHumidity'value_template:"{{state_attr('sensor.purpleair','results')[0]['humidity']}}"unit_of_measurement:"%"entity_id:sensor.purpleairpurpleair_pressure:unique_id:'purpleair_SENSORID_pressure'friendly_name:'PurpleAirPressure'value_template:"{{state_attr('sensor.purpleair','results')[0]['pressure']}}"unit_of_measurement:"hPa"entity_id:sensor.purpleair
Quirks
I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�
So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.
A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.
I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.
Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.
“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).
It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.
Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.
Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:
Beat Saber – Skunkynator playing Expert+, Mar 16 2021
Strap in. Here’s what I’ve worked on in the last 6 weeks:
Pose Matching improvements
Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.
I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.
Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.
In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.
I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.
LED labelling
When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.
The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.
IMU Gyro scale
One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.
The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.
From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.
The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.
Limited interpolation
If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.
Exponential filtering
I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.
Non-blocking radio
Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.
Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.
As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.
Haptic feedback
The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.
In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.
With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.
I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:
To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.
Sensor fusion efficiency
The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.
At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.
Lens Distortion
To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.
Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.
Camera / USB failures
The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.
Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.
Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.
Occluded cases
Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.
If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.
Upstreaming
April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.
Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.
One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.
Others
I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.
Other Headsets
At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.
Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.
Sponsorship
This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.
Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).
The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.
Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.
Have you heard of Bipolar II?
Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.
Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)
Call to action
If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.
Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.
I will be writing more about the topic in the coming time.
This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.
I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:
It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental
Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.
Meson
I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.
I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.
In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):
Do a brute-force search to match LEDs to 3D models, then (if matched)
Assign labels to each LED blob in the video frame saying what LED they are.
Send an update to the fusion filter about the position / orientation of the device
Then, as each video frame arrives:
Use motion flow between video frames to track the movement of each visible LED
Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.
The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible
What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.
Gravity Vector matching
The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.
The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.
Partial tracking matches
The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.
With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.
While this definitely seems to help, I think the approach can use more work.
Status
With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.
I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.
For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.
Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.
After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.
These lack of updates are also likely because I’ve been quite caught up with stuff.
Monday I had a steak from Bay Leaf Steakhouse for dinner. It was kind of weird eating it from packs, but then I’m reminded you could do this in economy class. Tuesday I wanted to attempt to go vegetarian and by the time I was done with a workout, the only place was a chap fan shop (Leong Heng) where I had a mixture of Chinese and Indian chap fan. The Indian stall is run by an ex-Hyatt staff member who immediately recognised me! Wednesday, Alice came to visit, so we got to Hanks, got some alcohol, and managed a smorgasbord of food from Pickers/Sate Zul/Lila Wadi. Night ended very late, and on Thursday, visited Hai Tian for their famous salted egg squid and prawns in a coconut shell. Friday was back to being normal, so I grabbed a pizza from Mint Pizza (this time I tried their Aussie variant). Saturday, today, I hit up Rasa Sayang for some matcha latte, but grabbed food from Classic Pilot Cafe, which Faeeza owns! It was the famous salted egg chicken, double portion, half rice.
As for workouts, I did sign up for Mantas but found it pretty hard to do, timezone wise. I did spend a lot of time jogging on the beach (this has been almost a daily affair). Monday I also did 2 MD workouts, Tuesday 1 MD workout, Wednesday half a MD workout, Thursday I did a Ping workout at Pwrhouse (so good!), Friday 1 MD workout, and Saturday an Audrey workout at Pwrhouse and 1 MD workout.
Wednesday I also found out that Rasmus passed away. Frankly, there are no words.
Thursday, my Raspberry Pi 400 arrived. I set it up in under ten minutes, connecting it to the TV here. It “just works”. I made a video, which I should probably figure out how to upload to YouTube after I stitch it together. I have to work on using it a lot more.
COVID-19 cases are through the roof in Malaysia. This weekend we’ve seen two days of case breaking records, with today being 5,728 (yesterday was something close). Nutty. Singapore suspended the reciprocal green lane (RGL) agreement with Malaysia for the next 3 months.
I’ve managed to finish Bridgerton. I like the score. Finding something on Netflix is proving to be more difficult, regardless of having a VPN. Honestly, this is why Cable TV wins… linear programming that you’re just fed.
Stock market wise, I’ve been following the GameStop short squeeze, and even funnier is the Top Glove one, that they’re trying to repeat in Malaysia. Bitcoin seems to be doing “reasonably well” and I have to say, I think people are starting to realise decentralised services have a future. How do we get there?
What an interesting week, I look forward to more productive time. I’m still writing in my Hobonichi Techo, so at least that’s where most personal stuff ends up, I guess?
I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!
I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy
Check it out:
I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.
I have a list of more things to work on. See you at the next update!
What an unplanned day. I woke up in time to do an MD workout, despite feeling a little sore. So maybe I was about 10 minutes late and I missed the first set, but his workouts are so long, and I think there were seven sets anyway. Had a good brunch shortly thereafter.
Did a bit of reading, and then I decided to do a beach boardwalk walk… turns out they were policing the place, and you can’t hit the boardwalk. But the beach is fair game? So I went back to the hotel, dropped off my slippers, and went for a beach jog. Pretty nutty.
Came back to read a little more and figured I might as well do another MD workout. Then I headed out for dinner, trying out a new place — Mint Pizza. Opened 20.12.2020, and they’re empty, and their pizza is actually pretty good. Lamb and BBQ chicken, they did half-and-half.
Twitter was discussing Raspberry Pi’s, and all I could see is a lot of misinformation, which is truly shocking. The irony is that open source has been running the Internet for so long, and progressive web apps have come such a long way…
Learn concepts, not tools.
One ensures lifelong flexibility, the other has potential for obsolescence.
Your knowledge of doing a mail merge will spread from Word, to Pages, to Writer or Docs.
Back in the day when I did OpenOffice.org or Linux training even, we always did say you should learn concepts and not tools. From the time we ran Linux installfests in the late-90s in Sunway Pyramid (back then, yes, Linux was hard, and you had winmodems), but I had forgotten that I even did stuff for school teachers and NGOs back in 2002… I won’t forget PC Gemilang either…
Anyway, I placed an order again for another Raspberry Pi 400. I am certain that most people talk so much crap, without realising that Malaysia isn’t a developed nation and most people can’t afford a Mac let alone a PC. Laptops aren’t cheap. And there are so many other issues…. Saying Windows is still required in 2021 is the nuttiest thing I’ve heard in a long time. Easy to tweet, much harder to think about TCO, and realise where in the journey Malaysia is.
Maybe the best thing was that Malaysian Twitter learned about technology. I doubt many realised the difference between a Pi board vs the 400, but hey, the fact that they talked about tech is still a win (misinformed, but a win).
Today is the first day that in the state of Pahang, we have to encounter what many Malaysians are referring to as the Movement Control Order 2.0 (MCO 2.0). I think everyone finally agrees with the terminology that this is a lockdown now, because I remember back in the day when I was calling it that, I’d definitely offend a handful of journalists.
This is one interesting change for me compared to when I last wrote Life with Rona — Day 56 of being indoors and not even leaving my household, in Kuala Lumpur. I am now not in the state, I am living in a hotel, and I am obviously moving around a little more since we have access to the beach.
KL/Selangor and several other states have already been under the MCO 2.0 since January 13 2021, and while it was supposed to end on January 26, it seems like they’ve extended and harmonised the dates for Peninsular Malaysia to end on February 4 2021. I guess everyone got the “good news” yesterday. The Prime Minister announced some kind of aid last week, but it is still mostly a joke.
Today was the 2nd day I woke up at around 2.30pm because I went to bed at around 8am. First day I had a 23.5 hour uptime, and the today was less brutal, but working from 1-8am with the PST timezone is pretty brutal. Consequently, I barely got too much done, and had one meal, vegetarian, two packs that included rice. I did get to walk by the beach (between Teluk Cempedak and Teluk Cempedak 2), did quite a bit of exercise there and I think even the monkeys are getting hungry… lots of stray cats and monkeys. Starbucks closes at 7pm, and I rocked up at 7.10pm (this was just like yesterday, when I arrived at 9.55pm and was told they wouldn’t grant me a coffee!).
While writing this entry, I did manage to get into a long video call with some friends and I guess it was good catching up with people in various states. It also is what prevented me from publishing this entry!
Day 2
I did wake up reasonable early today because I had pre-ordered room service to arrive at 9am. There is a fixed menu at the hotel for various cuisines (RM48/pax, thankfully gratis for me) and I told them I prefer not having to waste, so just give me what I want which is off menu items anyway. Roti telur double telur (yes, I know it is a roti jantan) with some banjir dhal and sambal and a bit of fruit on the side with two teh tariks. They delivered as requested. I did forget to ask for a jar of honey but that is OK, there is always tomorrow.
I spent most of the day vacillating, and wouldn’t consider it productive by any measure. Just chit chats and napping. It did rain today after a long time, so the day seemed fairly dreary.
When I finally did awaken from my nap, I went for a run on the beach. I did it barefoot. I have no idea if this is how it is supposed to be done, or if you are to run nearer the water or further up above, but I did move around between the two quite often. The beach is still pretty dead, but it is expected since no one is allowed to go unless you’re a hotel guest.
The hotel has closed 3/4 of their villages (blocks) and moved everyone to the village I’m staying in (for long stay guests…). I’m thankful I have a pretty large suite, it is a little over 980sqft, and the ample space, while smaller than my home, is still welcome.
Post beach run, I did a workout with MD via Instagram. It was strength/HIIT based, and I burnt a tonne, because he gave us one of his signature 1.5h classes. It was longer than the 80 minute class he normally charges RM50 for (I still think this is undervaluing his service, but he really does care and does it for the love of seeing his students grow!).
Post-workout I decided to head downtown to find some dinner. Everything at the Teluk Cemepdak block of shops was closed, so they’re not even bothered with doing takeaway. Sg. Lembing steakhouse seemed to have cars parked, Vanggey was empty (Crocodile Rock was open, can’t say if there was a crowd, because the shared parking lot was empty), there was a modest queue at Sate Zul, and further down, Lena was closed, Pickers was open for takeaway but looked pretty closed, Tjantek was open surprisingly, and then I thought I’d give Nusantara a try again, this time for food, but their chef had just gone home at about 8pm. Oops. So I drove to LAN burger, initially ordering just one chicken double special; however they looked like they could use the business so I added on a beef double special. They now accept Boost payments so have joined the e-wallet era. One less place to use cash, which is also why I really like Kuantan. On the drive back, Classic Pilot Cafe was also open and I guess I’ll be heading there too during this lockdown.
Came back to the room to finish both burgers in probably under 15 minutes. While watching the first episode of Bridgerton on Netflix. I’m not sure what really captivates, but I will continue on (I still haven’t finished the first episode). I need to figure out how to use the 2 TVs that I have in this room — HDMI cable? Apple TV? Not normally using a TV, all this is clearly more complex than I care to admit.
I soaked longer than expected, ended up a prune, but I’m sure it will give me good rest!
One thought to leave with:
“Learn to enjoy every minute of your life. Be happy now. Don’t wait for something outside of yourself to make you happy in the future.” — Earl Nightingale
In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.
Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.
My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.
The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.
C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).
At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.
CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.
I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.
CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.
One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.
CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.
I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.
For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).
The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.
CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.
I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!
CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.
Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.
#!/usr/local/bin/cz --
use b
use ccomplex
Main:
num outside = 16, ox = -0.5, oy = 0, r = 1.5
long i, max_i = 50, rb_i = 30
space()
uint32_t *px = pixel() # CONFIGURE!
num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
for(y, 0, h):
cmplx c = x0 + (y0-d*y)*I
repeat(w):
cmplx w = c
for i=0; i < max_i && cabs(w) < outside; ++i
w = w*w + c
*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
c += d
I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.
An image from the Mandelbrot set, generated by a fairly simple CZ program.
I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.
Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:
+
add, subtract, disjoint union, zero
×
multiply, divide, cartesian product, one
^
power, root, logarithm
◢
sin, cos, sin-1, cos-1, hypot, atan2
δ
differential, integral
a set of minimal relational operators for elementary math
I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.
CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.
My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!
The uBITX uses an Arduino internally. This article describes how to update its software.
Required hardware
The connector on the back is a Mini-B USB connector, so you'll need a "Mini-B to A" USB cable. This is not the same cable as used with older Android smartphones. The Mini-B connector was used with a lot of cameras a decade ago.
You'll also need a computer. I use a laptop with Fedora Linux installed.
Required software for software development
In Fedora all the required software is installed with sudo dnf install arduino git. Add yourself to the users and lock groups with sudo usermod -a -G users,lock $USER (on Debian-style systems use sudo usermod -a -G dialout,lock $USER). You'll need to log out and log in again for that to have an effect (if you want to see which groups you are already in, then use the id command).
Run arduino as your ordinary non-root user to create the directories used by the Arduino IDE. You can quit the IDE once it starts.
Obtain the uBITX software
$ cd ~/Arduino
$ git clone https://github.com/afarhan/ubitxv6.git ubitx_v6.1_code
Connect the uBITX to your computer
Plug in the USB cable and turn on the radio. Running dmesg will show the Arduino appearing as a "USB serial" device:
usb 1-1: new full-speed USB device number 6 using xhci_hcd
usb 1-1: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
usb 1-1: Product: USB Serial
usbcore: registered new interface driver ch341
usbserial: USB Serial support registered for ch341-uart
ch341 1-1:1.0: ch341-uart converter detected
usb 1-1: ch341-uart converter now attached to ttyUSB1
If you want more information about the USB device then use:
$ lsusb -d 1a86:7523
Bus 001 Device 006: ID 1a86:7523 QinHeng Electronics CH340 serial converter
In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.
A Quick Recap
When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.
The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.
The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).
The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).
The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.
Integration in OpenHMD
When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.
However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.
By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.
Headset motion capture trace
If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.
Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.
Headset motion capture – zoomed in view
This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.
The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.
Losing Track
The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.
That looks more like this
Headset motion capture trace with tracking errors
This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.
Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.
Latency Compensation
There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.
Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.
To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.
The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.
When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.
Here’s a good example of the difference:
Before: Position filtering with no latency compensationAfter: Latency-compensated position reports
Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.
At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.
The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.
At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.
There are some other possible approaches to this problem:
Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.
Controllers
So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.
Oculus Camera view of headset and left controller.
This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.
Position tracking of left controller with lots of tracking loss.
Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.
Timing Improvements
One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).
There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.
Sponsorships
All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!
Next Steps
The next things on my todo list are:
Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
Publish the UKF branch for other people to try.
Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.
udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:
While I hope to update this site again soon, here’s a photo I captured over the weekend in my back yard. The red flowering plant is attracting wattlebirds and honey-eaters. This wattlebird stayed still long enough for me to take this shot. After a little bit of editing, I think it has turned out rather well.
Photo taken with: Canon 7D Mark II & Canon 55-250mm lens.
Edited in Lightroom and Photoshop (to remove a sun glare spot off the eye).
Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.
Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.
I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:
__NOT_A_MATCHER__=object()__MATCHER_SORT_KEY__=0defswitch(cls):inst=cls()methods=[]forattrindir(inst):method=getattr(inst,attr)matcher=getattr(method,"__matcher__",__NOT_A_MATCHER__)ifmatcher==__NOT_A_MATCHER__:continuemethods.append(method)methods.sort(key=lambdai:i.__matcher_sort_key__)formethodinmethods:matches=method.__matcher__()ifmatches:returnmethod()raiseValueError(f"No matcher matches value {test_value}")defcase(matcher):def__decorator__(f):global__MATCHER_SORT_KEY__f.__matcher__=matcherf.__matcher_sort_key__=__MATCHER_SORT_KEY____MATCHER_SORT_KEY__+=1returnfreturn__decorator__if__name__=="__main__":foriinrange(100):@switchclassFizzBuzz:@case(lambda:i%15==0)deffizzbuzz(self):return"fizzbuzz"@case(lambda:i%3==0)deffizz(self):return"fizz"@case(lambda:i%5==0)defbuzz(self):return"buzz"@case(lambda:True)defdefault(self):return"-"print(f"{i}{FizzBuzz}")
fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it
thank fuck for Stylus. and uBlock Origin. and uMatrix.
Earlier today I launched this site. It is the result of a lot of work over the past few weeks. It began as an idea to publicise some of my photos, and morphed into the site you see now, including a store and blog that I’ve named “Photekgraddft”.
In the weirdly named blog, I want to talk about photography, the stories behind some of my more interesting shots, the gear and software I use, my technology career, my recent ADHD diagnosis and many other things.
This scares me quite a lot. I’ve never really put myself out onto the internet before. If you Google me, you’re not going to find anything much. Google Images has no photos of me. I’ve always liked it that way. Until now.
ADHD’ers are sometimes known for “oversharing”, one of the side-effects of the inability to regulate emotions well. I’ve always been the opposite, hiding, because I knew I was different, but didn’t understand why.
The combination of the COVID-19 pandemic and my recent ADHD diagnosis have given me a different perspective. I now know why I hid. And now I want to engage, and be engaged, in the world.
If I can be a force for positive change, around people’s knowledge and opinion of ADHD, then I will.
If talking about Business Analysis (my day job), and sharing my ideas for optimising organisations helps anyone at all, then I will.
If I can show my photos and brighten someone’s day by allowing them to enjoy a sunset, or a flying bird, then I will.
And if anyone buys any of my photos, then I will be shocked!
So welcome to my little vanity project. I hope it can be something positive, for me, if for noone else in this new, odd world in which we now find ourselves living together.
Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.
I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.
I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org
// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating
// point number, using integer and single-precision floating point math, at
// compile time, in rust, without unsafe blocks, while using as few unstable
// features as I can.
//
// or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"
// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose.
// But I did learn a thing or two while writing it :)
// This is needed to be able to perform floating point operations in a const
// function:
#![feature(const_fn)]
// bits_transmute(): Returns the bits representing a floating point value, by
// way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally
// unnecessary nature of the exercise :D - here's a short, straightforward,
// library-based version. But it needs the const_transmute flag and an unsafe
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
unsafe { std::mem::transmute::<f32, u32>(f) }
}
// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
// Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
let pred_mask = (-1 * (predicate as i32)) as u32;
let true_val = if_true & pred_mask;
let false_val = if_false & !pred_mask;
true_val | false_val
}
// get_if_f32(predicate, if_true, if_false):
// Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
//
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
// can't convert bool to f32 - but can convert bool to i32 to f32
let pred_sel = (predicate as i32) as f32;
let pred_not_sel = ((!predicate) as i32) as f32;
let true_val = if_true * pred_sel;
let false_val = if_false * pred_not_sel;
true_val + false_val
}
// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
// the result value, initialized to a NaN value that will otherwise not be
// produced by this function.
let mut r = 0xffff_ffff;
// These floation point operations (and others) cause the following error:
// only int, `bool` and `char` operations are stable in const fn
// hence #![feature(const_fn)] at the top of the file
// Identify special cases
let is_zero = f == 0_f32;
let is_inf = f == f32::INFINITY;
let is_neg_inf = f == f32::NEG_INFINITY;
let is_nan = f != f;
// Writing this as !(is_zero || is_inf || ...) cause the following error:
// Loops and conditional expressions are not stable in const fn
// so instead write this as type coversions, and bitwise operations
//
// "normalish" here means that f is a normal or subnormal value
let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) |
(is_neg_inf as u32) | (is_nan as u32));
// set the result value for each of the special cases
r = get_if_u32(is_zero, 0, r); // if (iz_zero) { r = 0; }
r = get_if_u32(is_inf, 0x7f80_0000, r); // if (is_inf) { r = 0x7f80_0000; }
r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
r = get_if_u32(is_nan, 0x7fc0_0000, r); // if (is_nan) { r = 0x7fc0_0000; }
// It was tempting at this point to try setting f to a "normalish" placeholder
// value so that special cases do not have to be handled in the code that
// follows, like so:
// f = get_if_f32(is_normal, f, 1_f32);
//
// Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
// Instead of switching the value, we work around the non-normalish cases
// later.
//
// (This whole function is branch-free, so all of it is executed regardless of
// the input value)
// extract the sign bit
let sign_bit = get_if_u32(f < 0_f32, 1, 0);
// compute the absolute value of f
let mut abs_f = get_if_f32(f < 0_f32, -f, f);
// This part is a little complicated. The algorithm is functionally the same
// as the C++ version linked from the top of the file.
//
// Because of the various contrived constraints on thie problem, we compute
// the exponent and significand, rather than extract the bits directly.
//
// The idea is this:
// Every finite single precision float point number can be represented as a
// series of (at most) 24 significant digits as a 128.149 fixed point number
// (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus
// one more so that the decimal point falls on a power-of-two boundary :)
// 149: 126 negative exponent values, plus 23 for the bits of precision in the
// significand.)
//
// If we are able to scale the number such that all of the precision bits fall
// in the upper-most 64 bits of that fixed-point representation (while
// tracking our effective manipulation of the exponent), we can then
// predictably and simply scale that computed value back to a range than can
// be converted safely to a u64, count the leading zeros to determine the
// exact exponent, and then shift the result into position for the final u32
// representation.
// Start with the largest possible exponent - subsequent steps will reduce
// this number as appropriate
let mut exponent: u32 = 254;
{
// Hex float literals are really nice. I miss them.
// The threshold is 2^87 (think: 64+23 bits) to ensure that the number will
// be large enough that, when scaled down by 2^64, all the precision will
// fit nicely in a u64
const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87
// The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number
// between 2^87 and 2^64 will not overflow in a single scaling step.
const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41
// Because loops are not available (no #![feature(const_loops)], and 'if' is
// not available (no #![feature(const_if_match)]), perform repeated branch-
// free conditional multiplication of abs_f.
// use a macro, because why not :D It's the most compact, simplest option I
// could find.
macro_rules! maybe_scale {
() => {{
// care is needed: if abs_f is above the threshold, multiplying by 2^41
// will cause it to overflow (INFINITY) which will cause get_if_f32() to
// return NaN, which will destroy the value in abs_f. So compute a safe
// scaling factor for each iteration.
//
// Roughly equivalent to :
// if (abs_f < THRESHOLD) {
// exponent -= 41;
// abs_f += SCALE_UP;
// }
let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP, 1_f32);
exponent = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent);
abs_f = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
}}
}
// 41 bits per iteration means up to 246 bits shifted.
// Even the smallest subnormal value will end up in the desired range.
maybe_scale!(); maybe_scale!(); maybe_scale!();
maybe_scale!(); maybe_scale!(); maybe_scale!();
}
// Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
// scale it down to be in the range (2^23 <= _ < 2^64), and convert without
// loss of precision to u64.
const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
let a = (abs_f * INV_2_64) as u64;
// Count the leading zeros.
// (C++ doesn't provide a compile-time constant function for this. It's nice
// that rust does :)
let mut lz = a.leading_zeros();
// if the number isn't normalish, lz is meaningless: we stomp it with
// something that will not cause problems in the computation that follows -
// the result of which is meaningless, and will be ignored in the end for
// non-normalish values.
lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }
{
// This step accounts for subnormal numbers, where there are more leading
// zeros than can be accounted for in a valid exponent value, and leading
// zeros that must remain in the final significand.
//
// If lz < exponent, reduce exponent to its final correct value - lz will be
// used to remove all of the leading zeros.
//
// Otherwise, clamp exponent to zero, and adjust lz to ensure that the
// correct number of bits will remain (after multiplying by 2^41 six times -
// 2^246 - there are 7 leading zeros ahead of the original subnormal's
// computed significand of 0.sss...)
//
// The following is roughly equivalent to:
// if (lz < exponent) {
// exponent = exponent - lz;
// } else {
// exponent = 0;
// lz = 7;
// }
// we're about to mess with lz and exponent - compute and store the relative
// value of the two
let lz_is_less_than_exponent = lz < exponent;
lz = get_if_u32(!lz_is_less_than_exponent, 7, lz);
exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
}
// compute the final significand.
// + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
// Shifts are done in u64 (that leading bit is shifted into the void), then
// the resulting bits are shifted back to their final resting place.
let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;
// combine the bits
let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;
// return the normalish result, or the non-normalish result, as appopriate
get_if_u32(is_normalish, computed_bits, r)
}
// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);
// Run-time validation of many more values
fn main() {
let end: usize = 0xffff_ffff;
let count = 9_876_543; // number of values to test
let step = end / count;
for u in (0..=end).step_by(step) {
let v = u as u32;
// reference
let f = unsafe { std::mem::transmute::<u32, f32>(v) };
// compute
let c = bits(f);
// validation
if c != v &&
!(f.is_nan() && c == 0x7fc0_0000) && // nans
!(v == 0x8000_0000 && c == 0) { // negative 0
println!("{:x?} {:x?}", v, c);
}
}
}
Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.
This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).
Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.
Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.
Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?
Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.
10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.
The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.
Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.
How to migrate a Debian WordPress site to Jekyll
Should you find the Jekyll exporter not working on your Debian WordPress install:
Use the standard WordPress export to export an XML feel of your site.
Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
Import the exported XML feed.
Install the Jekyll exporter plugin.
Follow the documentation and receive a Jekyll export of your site.
Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.
I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.
Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.
A few interesting aspects I encountered:
The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
The time to do an installation was incredibly long.
The initrd image is really just running coreos-installer against the nominated device.
During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.
A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.
FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.
For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.
Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.
I’ve gotten most of my information from a few different youtube channels:
There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.
While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.
Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.
I bought parallettes for working on the planche. Parallel bars for dips and rows. A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.
In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.
On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.
For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.
If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.
The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]
If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?
Incubation Lag – about 5 days
When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent. The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).
Presentation Lag – about 2 days
I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test. Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.
Referral Lag – about a day
Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.
Testing lag – about 2 days
The graph of infections “epi graph” today looks like this:
One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday. This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.
That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.
Total lag
From the date someone is infected to the time that they receive a positive confirmation is about:
lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result
So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).
If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community. So, the 22 March figure of 1098 infections is actually really a 12 March figure.
What the lag means for Physical (ie Social) Distancing
The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.
The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.
Estimating Actual Infections as at Today
How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5). Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.
There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped. So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.
Note:
This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.
I was always of the belief that secrets within Kubernetes (k8s) are secure. How wrong I was! After a recent meetup featuring a Google security expert, I discovered that the secrets I have in our k8s cluster are no more secure than writing them down on paper and leaving it on a park bench.
Google Cloud Platform is a fantastic offering, and yes, I am biased. I have used every other cloud platform from every major vendor over the last 10 odd years of my career.
As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.
Conference Opening
That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).
I actually think there was a lot of information in this introduction. Perhaps too much?
OpenZFS and Linux
A nice update on where zfs is these days.
Dev/Ops relationships, status: It’s Complicated
A bit of a war story about production systems, leading to a moment of empathy.
Samba 2020: Why are we still in the 1980s for authentication?
There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?
Tyranny of the Clock
A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.
Configuration Is (riskier than?) Code
Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.
Easy Geo-Redundant Handover + Failover with MARS + systemd
Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.
2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.
We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.
When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.
At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.
The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).
I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.
Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.
Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.
We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).
I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.
I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.
My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.
In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol squats, push ups, dead hangs and so on. My upper body strength needs to come along some before we can really go places though… and we’re probably going to max out the hamstring curl machine (at least for regular two-leg curls) before my core is strong enough to do a Nordic drop.
Lynne has been worried about injuring herself with weight lifting at the gym for some time now, but recently saw my physio – Ben Cameron at Pegasus PhysioSouth – who is excellent, and he suggested that she could have less chronic back pain if she took weights back up again. She’s recently told me that I’m allowed one ‘told you so’ about this, since she found herself in a spot where previously she would have put herself in a poor lifting position, but the weight training gave her a better option and she intuitively used it, avoiding pain. So that’s a good thing – complicated because of her bodies complicated history, but an excellent trainer and physio team are making progress.
Earlier this year she had a hell of a fright, with a regular eye checkup getting referred into a ‘you are going blind; maybe tomorrow, maybe within 10 years’ nightmare scenario. Fortunately a second opinion got a specialist who probably knows the same amount but was willing to communicate it with actual words… Lynne has a condition which diabetes (type I or II) can affect, and she has a vein that can alter state somewhat arbitrarily but will probably only degrade slowly, particularly if Lynne’s diet is managed as she has been doing.
Diet wise, Lynne also has been losing some weight but this is complicated by her chronic idiopathic pancreatitis. That’s code for ‘it keeps happening and we don’t know why’ pancreatitis. We’ve consulted a specialist in the North Island who comes highly recommended by Lynne’s GP, who said that rapid weight loss is a little known but possible cause of pancreatitis – and that fits the timelines involved. So Lynne needs to lose weight to manage the onset of type II diabetes. But not to fast, to avoid pancreatitis, which will hasten the onset of type II diabetes. Aiee. Slow but steady – she’s working with the same doctor I am for that, and a similar diet, though lower on the fats as she has no gall… bladder.
In April our kitchen waste pipe started chronically blocking, and investigation with a drain robot revealed a slump in the pipe. Ground penetrating radar reveal an anomaly under the garage… and this escalated. We’re going to have to move out of the house for a week while half the house’s carpets are lifted, grout is pumped into the foundations to tighten it all back up again – and hopefully they don’t over pump it – and then it all gets replaced. Oh, and it looks like the drive will be replaced again, to fix the slumped pipe permanently. It will be lovely when done but right now we’re facing a wall of disruption and argh.
When we moved to Rangiora I was travelling a lot more, Christchurch itself had poorer air quality than Rangiora, and our financial base was a lot smaller. Now, Rangiora’s population has gone up nearly double (13k to 19k conservatively – and that’s ignoring the surrounds that use Rangiora as a base), we have more to work with, the air situation in Christchurch has improved massively, and even a busy years travel is less than I was doing before Cynthia came along. We’re looking at moving – we’re not sure where yet; maybe more country, maybe more city.
One lovely bright spot over the last few years has been reconnecting with friends from school, largely on Facebook – some of whom I had forgotten that I knew back at school – I had a little clique but was not very aware of the wider school population in hindsight (this was more than a little embarrassing to me, as I didn’t want to blurt out “who are you?!”) – and others whom I had not :). Some of these reconnections are just light touch person-X exists and cares somewhat – and that’s cool. One in particular has grown into a deeper friendship than we had back as schoolkids, and I am happy and grateful that that has happened.
Our cats are fat and happy. Well mostly. Baggy is fat and stressed and spraying his displeasure everywhere whenever the stress gets too much :(. Cynthia calls him Mr Widdlepants. The rest of the time he cuddles and purrs and is generally happy with life. Dibbler and Kitten-of-the-wild are relatively fine with everything.
Cynthia’s violin is coming along well. She did a small performance for her classroom (with her teacher) and wowed them. I’ve been inspired to start practising trumpet again. After 27 years of decay my skills are decidedly rusty, but they are coming along. Finding arrangements for violin + trumpet is a bit challenging, and my sight-reading-with-transposition struggles to cope, but we make do. Lynne is muttering about getting a clarinet or drum-kit and joining in.
So, 2019. Whew. I hope yours was less stressful and had as many or more bright points than ours. Onwards to 2020.
BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.
Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.
Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.
Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.
We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.
In June 2019 I started a new role as a software engineer at a startup called Cachecash. Today is probably the last day of payroll there, and as is my usual practice, I’m going to reflect back on my time there. Less commonly, I’m going to do so in public, as we’re about to open the code (yay), and its not a mega-corporation with everything shuttered up (also yay).
Framing
This is intended to be a blameless reflection on what has transpired. Blameless doesn’t mean inaccurate; but it means placing the focus on the process and system, not on the particular actor that happened to be wearing the hat at the time a particular event happened. Sometimes the system is defined by the actors, and in that case – well, I’ll let you draw your own conclusions if you encounter that case.
A retrospective that we can’t learn from is useless. Worse than useless, because it takes time to write and time to read and that time is lost to us forever. So if a thing is a particular way, it is going to get said. Not to be mean, but because false niceness will waste everyone’s time. Mine and my ex-colleagues whose time I respect. And yours, if you are still reading this.
What was Cachecash
Cachecash was a startup – still is in a very technical sense, corporation law being what it is. But it is still a couple of code bases – and a nascent open source project (which will hopefully continue) – built to operationalise and productise this research paper that the Cachecash founders wrote.
What it isn’t anymore is a company investing significant amounts of time and money in the form of engineering in making code, to make those code bases better.
Cachecash was also a team of people. That obviously changed over time, but at the time I write this it is:
Ghada
Justin
Kevin
Marcus
Petar
Robert
Scott
And we’re all pretty fantastic, if you ask me :).
Technical overview
The CAPNet paper that I linked above doesn’t describe a product. What it describes is a system that permits paying caches (think squid/varnish etc) for transmitting content to clients, while also detecting attempts by such caches to claim payment when they haven’t transmitted, or attempting to collude with a client to pretend to overtransmit and get paid that way. A classic incentives-aligned scheme.
Note that there is no blockchain involved at this layer.
The blockchain was added into this core system as a way to build a federated marketplace – the idea was that the blockchain provided a suitable substrate for negotiating the purchase and sale of contracts that would be audited using the CAPNet accounting system, the payments could be micropayments back onto the blockchain, and so on – we’d avoid the regular financial system, and we wouldn’t be building a fragile central system that would prevent other companies also participating.
Miners would mine coins, publishers would buy coins then place them in escrow as a promise to pay caches to deliver content to clients, and a client would deliver proof of delivery back to the cache which would then claim payment from the publisher.
Technical Challenges
There were a few things that turned up as significant issues. In no particular order:
The protocol
The protocol itself adds additional round trips to multiple peers – in its ‘normal’ configuration the client ends up running (web- for browers) GRPC connections to 5 endpoints (with all the normal windowing concerns, but potentially over QUIC), and then gets chunks of content in batches (concurrently) from 4 of the peers, runs a small crypto brute force operation on the combined result, and then moves onto the next group of content. This should be sounding suspiciously like TCP – it is basically a window management problem, and it has exactly the same performance management problems – fast start, maximum window size, how far to reduce it when problems are suffered. But accentuated: those 4 cache peers can all suffer their own independent noise problems, or be hostile. But also, they can also suffer correlated problems: they might all be in the same datacentre, or be all run by a hostile actor, or the client might be on a hostile WiFi link, or the client’s OS/browser might be hostile. Lets just say that there is a long, rich road for optimising this new protocol to make it fast, robust, reliable. Much as we have taken many years to make HTTP into QUIC, drawing upon techniques like forward error correction rather than retries – similar techniques will need to be applied to give this protocol similar performance characteristics. And evolving the protocol while maintaining the security properties is a complicated task, with three actors involved, who may collude in various ways.
An early performance analysis I did on the go code implementation showed that the brute forcing work was a bottleneck because while the time (once optimise) per second was entirely modest for any small amount of data, the delay added per window element acts as a brake on performance for high capacity low latency links. For a 1Gbps 25ms RTT link I estimated a need for 8 cores doing crypto brute forcing on the client.
JS
Cachecash is essentially implementing a new network protocol. There are some great hooks these days in browsers, and one can hook in and provide streams to things like video players to let them get one segment of video. However, for downloading an entire file – for instance, if one is downloading a full video, it is not so easy. This bug, open for 2 years now, is the standards based way to do it. Even so non-standards based way to do it involves buffering the entire content in memory, oh and reflecting everything through a static github service worker. (You of course host such a static page yourself, but then the whole idea of this federated distributed system breaks down a little).
Our initial JS implementation was getting under 512KBps with all-local servers – part of that was the bandwidth delay product issue mentioned above. Moving to getting chunks of content from each cache concurrently using futures improved that up to 512KBps, but thats still shocking for a system we want to be able to compete with the likes of Youtube, Cloudflare and Akamai.
One of the hot spots turned out to be calculating SHA-256 values – the CAPNet algorithm calculates thousands (it’s tunable, but 8k in the set I was analysing) of independent SHA’s per chunk of received data. This is a problem – in browser SHA routines, even the recent native hosted ones – are slow per SHA. They are not slow per byte. Most folk want to make a small number of SHA calculations. Maybe thousands in total. Not tens of thousands per MB of data received….. So we wrote an implementation of the core crypto routines in Rust WASM, which took our performance locally up to 2MBps in Firefox and 6MBps in Chromium.
It is also possible we’d show up as crypto-JS at that point and be blacklisted as malware!
Blockchain
Having chosen to involve a block chain in the stack we had to deal with that complexity. We chose to take bitcoin’s good bits and run with those rather than either running a sidechain, trying to fit new transaction types into bitcoin itself, or trying to shoehorn our particular model into e.g. Ethereum. This turned out to be a fairly large amount of work : not the core chain itself – cloning the parts of bitcoin that we wanted was very quick. But then layering on the changes that we needed, to start dealing with escrows and negotiating parameters between components and so forth. And some of the operational challenges below turned up here as well even just in developer test setups (in particular endpoint discovery).
Operational Challenges
The operational model was pretty interesting. The basic idea was that eventually there would be this big distributed system, a bit-coin like set of miners etc, and we’d be one actor in that ecosystem running some subset of the components, but that until then we’d be running:
A centralised ledger
Centralised random number generation for the micropayment system
Centralised deployment and operations for the cache fleet
We had most of this live and running in some fashion for most of the time I was there – we evolved it and improved it a number of times as we iterated on things. Where appropriate we chose open source components like Jaeger, Prometheus and Elasticsearch. We also added policy layers on top of them to provide rate limiting and anti-spoofing facilities. We deployed stuff in AWS, with EKS, and there were glitches and things to workaround but generally only a tiny amount of time went into that part of it. I think I spent a day on actual operations a month, or thereabouts.
Other parties were then expected to bring along additional caches to expand the network, additional publishers to expand the content accessible via the network, and clients to use the network.
Ensuring a process run by a third party is network reachable by a browser over HTTPS is a surprisingly non-simple problem. We partly simplified it by mandating that they run a docker container that we supplied, but there’s still the chance that they are running behind a firewall with asymmetric ingress. And after that we still need a domain name for their endpoint. You can give every cache a CNAME in a dedicated subdomain – say using their public key as the subdomain, so that only that cache can issue requests to update their endpoint information in DNS. It is all solvable, but doing it so that the amount of customer interaction and handholding is reduced to the bare minimum is important: a user with a fleet of 1000 machines doesn’t want to talk to us 1000 times, and we don’t want to talk to them either. But this was a another bit of this-isn’t-really-distributed-is-it grit in the distributed-ointment.
Adoption Challenges
ISPs with large fleets of machines are in principle happy to sell capacity on them in return for money – yay. But we have no revenue stream at the moment, so they aren’t really incentivised to put effort in, it becomes a matter of principle, not a fiscal “this is 10x better for my business” imperative. And right now, its 10x slower than HTTP. Or more.
Content owners with large amounts of content being delivered without a CDN would like a radically cheaper CDN. Except – we’re not actually radically cheaper on a cost structure basis. Current CDN’s are expensive for their expensive 2nd and third generation products because no-one offers what they offer – seamless in-request edge computing. But that ISP that is contributing a cache to the fleet is going to want the cache paid for, and thats the same cost structure as existing CDNs – who often have a free entry tier. We might have been able to make our network cheaper eventually, but I’m just not sure about the radically cheaper bit.
Content owners who would like a CDN marketplace where the CDN caches are competing with each other – driving costs down – rather than than the CDN operators competing – would absolutely love us. But I rather suspect that those owners want more sophisticated offerings. To be clear, I wasn’t on the customer development team, and didn’t get much in the way of customer development briefings. But things like edgecomputingworkers, where completely custom code can run in the CDN network, adjacent to ones user, are much more powerful offerings than simple static content shipping offerings, and offered by all major CDN’s. These are trusted services – the CAPNet paper doesn’t solve the problem of running edge code and providing proof that it was run. Enarx might go some, or even a long way way to running such code in an untrusted context, but providing a proof that it was run – so that running it can become a mining or mining-like operation is a whole other question. Without such an answer, an edge computing network starts to depend on trusting the caches behaviour a lot more all over again – the network has no proof of execution to depend on.
Rapid adjustment – load spikes – is another possible use case, but the use of the blockchain to negotiate escrows actually seemed to work against our ability to offer that. Akami define load spike in a time frame faster than many block chains can decide that a transaction has actually been accepted. Offchain transactions are of course a known thing in the block chain space but again that becomes additional engineering.
Our use of a new network protocol – for all that it was layered on standard web technology – made it harder for potential content owners to adopt our technology. Rather than “we have 200 local proxies that will deliver content to your users, just generate a url of the form X.Y.Z”, our solution is “we do not trust the 200 local proxies that we have, so you need to run complicated JS in your browser/phone app etc” to verify that the proxies are actually doing their job. This is better in some ways – precisely because we don’t trust those proxies, but it also increases both the runtime cost of using the service, the integration cost adopting the service, and complexity of debugging issues receiving content via the service.
What did we learn?
It is said that “A startup is an organization formed to search for a repeatable and scalable business model.” What did we uncover in our search? What can we take away going forward?
In principle we have a classic two sided market – people with excess capacity close to users want to sell it, and people with excess demand for their content want to buy delivery capacity.
The baseline market is saturated. The market as a whole is on its third or perhaps fourth (depending on how you define things) major iteration of functionality.
Content delivery purchasers are ok with trusting their suppliers : any supply chain fraud happening in this space at the moment is so small no-one is talking about it that I heard about.
Some of the things we were doing don’t seem to have been important to the customers we talked to – I don’t have a great read on this, but in particular, the blockchain aspect seems to have been more important to our long term vision than to the 2-sided market place that we perceived. It would be fascinating to me to validate that somehow – would cache capacity suppliers be willing to trust us enough to sell capacity to us with just the auditing mechanism, without the blockchain? Would content providers be happy buying credit from us rather than from a neutral exchange?
What did I learn?
I think in hindsight my startup muscles were atrophied – it had been some years since Canonical and it took a few months to start really thinking lean-startup again on a personal basis. That’s ok, because I was hired to build systems. But its not great, because I can do better. So number one: think lean-startup and really step up to help with learning and validation.
I levelled up my Go lang skills. That was really nice – Kevin has deep knowledge there, and though I’ve written Go before I didn’t have a good appreciation for style or aesthetics, or why. I do now. Where before I’d say ‘I’m happy to dive in but its not a language I feel I really know’, I am now happy to say that I know Go. More to learn – there always is – but in a good place.
I did a similar thing to my JS skills, but not to the same degree. Having climbed fairly deeply into the JS client – which is written in Typescript, converted its bundling system to webpack to work better with Rust-WASM, and so on. Its still not my go-to place, but I’m much more comfortable there now.
And of course playing with Rust-WASM was pure delight. Markus and I are both Rust afficionados, and having a genuine reason to write some Rust code for work was just delightful. Finding this bug was just a bonus :).
It was also really really nice being back in a truely individual contributor role for a while. I really enjoyed being able to just fix bugs and get on with things while I got my bearings. I’ve ended up doing a bit more leadership – refining of requirements, translating between idea-and-specification and the like recently, but still about 80% of time has been able to be sit-down-and-code, and that really is a pleasant holiday.
What am I going to change?
I’m certainly going to get a new job :). If you’re hiring, hit me up. (If you don’t have my details already, linkedin is probably best).
I’m think there the core thing I need to do is more alignment of the day to day work I’m doing with needs of customer development : I don’t want to take on or take over the customer development role – that will often be done best in person with a customer for startups, and I’m happy remote – but the more I can connect what I’m trying to achieve with what will get the customers to pay us, the more successful any business I’m working in will be. This may be a case for non-vanity metrics, or talking more with the customer-development team, or – well, I don’t know exactly what it will look like until I see the context I end up in, but I think more connection will be important.
And I think the second major thing is to find a better balance between individual contribution and leadership. I love individual contribution, it is perhaps the least stressful and most Zen place to be. But it is also the least effective unless the project has exactly one team member. My most impactful and successful roles have been leadership roles, but the pure leadership role with no individual contribution slowly killed me inside. Pure individual contribution has been like I imagine crack to be, and perhaps just as toxic in the long term.
Daniel wrote a lovely blog post about Rust’s ability to be included in distributions, both as a language that you can get via the distribution, and as the language that components of the distribution are being written in.
I think this is a great goal to raise and I have just a few thoughts and quibbles. First I want to acknowledge and agree with him on the Rust community, its so very nice, and he is doing a great thing as rustup lead; I wish I had more time to put in, I have more things I want to contribute to rustup. I’ll try to get back to the meetings soon.
On trust
I completely agree about the need for the crates index improvement : without those we cannot have a mirror network, and thats a significant issue for offline users and slow-region users.
On curlsh though
It isn’t the worst possible thing, for all that its “untrusted bootstrapping”, the actual thing downloaded is https secured etc, and so is the rustup binary itself. Put another way, I think the horror is more perceptual than analyzed risk. Someone that trusts Verisign etc enough to download the Debian installer enough over it, has exactly the same risk as someone trusting Verisign enough to download rustup at that point in time.
Cross signing curlsh that with per-distro keys or something seems pretty ridiculous to me, since the root of trust is still that first download; unless you’re wandering up to someone who has bootstrapped their compiler by hand (to avoid reflections-on-trust attacks), to get an installer, to build a system, to then do reproducible builds, to check that other systems are actually safe… aieeee.
I think its easier to package the curl|sh shell script in Debian itself perhaps? apt install get-rustup; then if / when rustup becomes packaged the user instructions don’t change but the root of trust would, as get-rustup would be updated to not download rustup, but to trigger a different package install, and so forth.
I don’t think its desirable though, to have distribution forks of the contents that rustup manages – Debian+Redhat+Suse+… builds of nightly rust with all the things failing or not, and so on – I don’t see who that would help. And if we don’t have that then the root of trust would still not be shifted under the GPG keychain – it would still be the HTTPS infrastructure for downloading rust toolchains + the integrity of the rustup toolchain builds themselves. Making rustup, which currently shares that trust, have a different trust root, seems pointless.
On duplication of dependencies
I think Debian needs to become more inclusive here, not Rustup. Debian has spent; pauses, counts, yes, DECADES, rejecting multiple entire ecosystems because of a prejuidiced view about what the Right Way to manage dependencies is. And they are not right in a universal sense. They were right in an engineering sense: given constraints (builds are expensive, bandwidth is expensive, disk is expensive), they are right. But those are not universal constraints, and seeking to impose those constraints on Java and Node – its been an unmitigated disaster. It hasn’t made those upstreams better, or more secure, or systematically fixed problems for users. I have another post on this so rather than repeating I’m going to stop here :).
I think Rust has – like those languages – made the crucial, maintainer and engineering efficiency important choice to embrace enabling incremental change across libraries, with the consequence that dependencies don’t shift atomically, and sure, this is basically incompatible with Debian packaging world view which says that point and patch releases of libraries are not distinct packages, and thus the shared libs for these things all coexist in the same file on disk. Boom! Crash!
I assert that it is entirely possible to come up with a reasonable design for managing a respository of software that doesn’t make this conflation, would allow actual point and patch releases of exist as they are for the languages that have this characteristic, and be amenable to automation, auditing and reporting for security issues. E.g. Modernise Debian to cope with this fundamentally different language design decision… which would make Java and Node and Rust work so very much better.
Alternatively, if Debian doesn’t want to make it possible to natively support languages that have made this choice, Debian could:
ship static-but-for-system-libs builds
not include things written in rust
ask things written in rust to converge their dependencies again and again and again (and only update them when the transitive dependencies across the entire distro have converged)
I have a horrible suspicion about which Debian will choose to do :(. The blinkers / echo chamber are so very strong in that community.
For Windows
We got to parity with Linux for IO for non-McAfee users, but I guess there are a lot of them out there; we probably need to keep pushing on tweaking it until it work better for them too; perhaps autodetect McAfee and switch to minimal? I agree that making Windows users – like I am these days – feel tier one, would be nice :). Maybe a survey of user experience would be a good starting point.
Shared libraries
Perhaps generating versioned symbols automatically and building many versions of the crate and then munging them together? But I’d also like to point here again that the whole focus on shared libraries is a bit of a distribution blind spot, and looking at the vast amount of distribution of software occuring in app stores and their model, suggests different ways of dealing with these things. See also the fairly specific suggestion I make about the packaging system in Debian that is the root of the problem in my entirely humble view.
Bonus
John Goerzen posted an entirely different thing recently, but in it he discusses programs that don’t properly honour terminfo. Sadly I happen to know that large chunks of the Rust ecosystem assume that everything is ANSI these days, and it certainly sounds like, at least for John, that isn’t true. So thats another way in which Rust could be more inclusive – use these things that have been built, rather than being modern and new age and reinventing the 95% match.
How well did that work for me? Pretty good. I had a good satisfying job at VMware for 3 years, met some wonderful people, achieved some very cool things. And those priorities above were broadly achieved. The one niggle that stands out was this – Did the things we were doing matter? Certainly there was no social impact – VMware isn’t a non-profit, being right at the core of capitalism as it is. There was direct connection and impact with the team, the staff we worked with and the users of the products… but it is just a bit hard to feel really connected through that though: VMware is a very large company and there are many layers between users and developers.
We were quite early adopters of Kubernetes, which allowed me to deepen my Go knowledge and experience some more fun with AWS scale operations. I had many interesting discussions about the relative strengths of Python Go and Rust and Java with colleagues there. (Hi Geoffrey).
Company culture is very important to me, and VMware has a fantastically supportive culture. One of the most supportive companies I’ve been in, bar none. It isn’t a truely remote-organised company though: rather its a bunch of offices that talk to each other, which I think is sad. True remote-first offers so much more engagement.
I enjoy building things to solve problems. I’ve either directly built, or shaped what is built, in all my most impactful and successful roles. Solving a problem once by hand is fine; solving it for years to come by creating a tool is far more powerful.
I seem to veer into toolmaking very often: giving other people the ability to solve their problems takes the power of a tool and multiplies it even further.
It should be no surprise then that I very much enjoy reading white papers like the original Dapper and Map-reduce ones, LinkedIn’s Kafka or for more recent fodder the Facebook Akkio paper. Excellent synthesis and toolmaking applied at industrial scale. I read those things and I want to be a part of the creation of those sorts of systems.
I was fortunate enough to take some time to go back to university part-time, which though logistically challenging is something I want to see through.
Thus I think my new roughly ordered (descending) list of priorities needs to be something like this:
Keep living in Rangiora (family)
Up to moderate travel requirements – 4 team-meeting trips a year + 2 conferences
Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will )
Be doing something that matters
Be working directly on a problem / system that has problems
Since moving down to Melbourne my poor sleep has started up again. It’s really hard to say what the main factor driving this is. My doctor down here has put me onto a drug free way of trying to improve my sleep, and I think I kind of like it, while it’s no silver bullet, it is something I can go back to if I’m having trouble with my sleep, without having to get a prescription.
The basic idea is to maximise sleep efficiency. If you’re only getting n hours sleep a night, only spend n hours a night in bed. This forces you to stay up and go to bed rather late for a few nights. Hopefully, being tired will help you sleep through the night in one large segment. Once you’ve successfully slept through the night a few times, relax your bed time by say fifteen minutes, and get used to that. Slowly over time, you increase the amount of sleep you’re getting, while keeping your efficiency high.