A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.
"This week, a major trial kicked off in Los Angeles in which hundreds of
families sued Meta, TikTok, Snap, and YouTube, accusing the companies of
intentionally designing their products to be addictive (though Snap and TikTok
"The Australian authority that regulates pesticides has finally released its
long-delayed review of the rodenticide poisons used by millions of Australians
to combat rat and mice infestations.
"On a damp October evening in 2013, a handful of developers in Vancouver's
Gastown gathered around folding tables. It was a wonderful, ideal concept they
discussed as their MacBook screens glowed. What if you didn't need four years
"More than two years on, you’d be forgiven for thinking the story of the
failure of the referendum on an Aboriginal and Torres Strait Islander Voice has
been neatly folded away and filed as a story of inevitable loss. Bipartisanship
"The new owner of one of the most successful battery storage and renewable
energy developers in Australia and the world, says battery storage costs have
fallen 60 per cent and changed the face of the grid as well as the nature of
"A Russian drone attack on a bus carrying mine workers in Ukraine’s
central-eastern Dnipropetrovsk region has killed at least 12 people, officials
said.
A slightly slower week, this one. My regular OSS day was spent instead with the kids on their last Friday of the summer holidays.
Some good movement on the site. Reviewed, tweaked, and merged this new status page from Jane Sandberg. Thank you Jane! Aaron is also back on the job and taking a last pass over our logo and type colours. I think we’ve managed to get past all the outstanding issues, and should hopefully be free to lock in our final site design.
In a few spare moments, I took the chance to bring more Hanami repos under repo-sync management. Now we have it for hanami, cli, router, controller, and view! Porting the latter three was very easy, so I’m confident we should be able to get the whole set moved across soon. I also enabled release-machine for all synced Hanami repos, so we’ll get to enjoy a nice streamlined release process soon, too.
One part of rolling out the repo-sync is switching to our new shared RuboCop config, and in doing this this week I discovered what I thought was a bug in RuboCop, resulting in the downloaded remote config files no longer being hidden files. Turns out this was an intentional change and TIL RuboCop’s path relativity feature. Will likely need to rename our shared RuboCop config to account for this.
Paweł is continuing his work on the JRuby rollout, making it so JRuby becomes a required version in the CI matrix for the projects where we’ve restored support. And he managed to fix the local tooling for our repo sync at the same time. Thanks Paweł.
Today, in a fit of passion, I fully built out a first cut of a Hanami Minitest gem. I hope to share it in the coming week. I’m not a Minitest user, so I’ll definitely need the community’s help to make sure our config is nice and idiomatic.
I spent a few more hours continuing to refine my Hanami Mailer rebuild. That one will hopefully be ready to share soon too.
While I was poking at things during the week, I discovered that hanami new (and any other outside-of-a-project hanami CLI innovations) were crashing on recent Rubies with Gem::LoadErrors, due to bundled gems already being activated, but being in conflict with the required versions for Hanami’s own dependencies. I fixed this nice and promptly and our CLI is back in action again.
This fix allowed me to make the first-ever automated release of a Hanami gem! I got hanami-cli activated for our release-machine and it all went very smoothly. I’m excited to finish the rollout to the rest of the Hanami gems and use this for all our releases.
I also released a small fix to hanami-rspec to ensure its command callbacks continue to run smoothly with recent Dry CLI releases. Thanks to Sebastjan Hribar for raising this issue!
With release-machine clearly working smoothly, I enabled it for all the Dry gems, while also reviewing the RubyGems.org ownership for each of those gems to make sure it’s current.
Up next for me is getting the v2 of Hanami Mailer ready to share and helping make sure our website work is heading in the right direction (our other Aussie contributors are now back from their respective summer holidays and ready to help again!).
Everything Open Everywhere All At Once by Steven De Costa
“ChatGPT: Please create an interesting keynote about random philosophical concepts strung together in a vaguely meaningful way and themed around Chickens”
Lightning Talks
End Security by Obscurity
mygov code generator app
enrol + TOPT
is it secure? Is it spyware?
Only availbale via the app store
Made Freedom of Information in 2021 and gone through multiple appeals/reviews after being denied
Looking for money to appeal further
High Altitude Balloons and and ASN.1
Need a protocol with various requirements to help recovered balloon and get data from it.
Existing protocol not ideal
asn.1 old protocol that might be useful
What would it take to run everything Open in New Zealand
Running a conference is hard
Small team and Harder
Good idea?
What will this actually take
Contact Chelsea if interested.
Open source is not all you need to fight inshitification
No but other freedoms are needed
Brain Model in your Hand
I’m doing a talk in front of 300 people. My brain thinks I’m being chased by a Lion
Learn an Indigenous Language
How to Eat Fruit
Help is at Hand
Join a Union
My Community
Open Source Institute
My $50 question now costs a trip to fench
Pycon did battle decks
What is the most popular emoji on github?
Ran a big query on Bigquery
Grabbed the software heritage project
Lots of small files. Hard to query or mirror
3 Petabytes. Too might to download
Solid Open Source Package
6 talks about deplatforming and/or self hosting this week
This edition is going out a couple days later than usual, because I had Bonus Weekend of open source work, and I wanted to be able to share everything with you all at once!
Here’s the background: one of our ambitions for 2026 is to begin a twice-yearly release cadence for Hanami. Since our last release was November, I chose May and November as a starting point. Right now, we’re now getting close to the end of January. We have a new site to ship in February sometime. I’m taking a family trip in April. The team overall has been pretty quiet for the last couple of months. So all of this has had me a bit worried that we wouldn’t have much to share come May.
This weekend the rest of my family went away for a retreat, leaving just me at home. So I took an extra day off work and sat down for two full days of Hanami feature development.
What I got done was the foundation of streamlined i18n support (drop a translations yaml in your project and then Hanami.app["i18n"].t("greeting", name: "Alice") will just work for you), as well as a prototype rebuild of Hanami Mailer to fit in our Hanami 2 ecosystem.
All up, I’m very happy with that outcome! Neither thing is quite ready to share yet, but I should be able to get that done towards the end of this week. This should allow a good couple of months for feedback and testing from our maintainer team and community, and puts us in good stead for a May release.
Aside from that, I did spend my regular Friday working on the site. The biggest change was making our version selectors link directly to the equivalent page in the other version, if it exists. This is exactly the kind of behaviour users would expect, and it also allowed us to get rid of a per-version guides index page, which is no longer necessary now that we show all guides in our left hand side navigation. One less page to design means one step closer to shipping.
Minutes of Linux Australia Annual General Meeting 2025
Monday, 20th Jan 2025, 6:00pm (ACDT)
Recording is available at https://www.youtube.com/watch?v=4clkwhrXImY
Attendance record is available upon request
Meeting started at 6:00pm ACDT.
1. President’s welcome
MR JOEL ADDISON, President
Acknowledgment of the traditional owners of the lands on which we meet, particularly the Kaurna people who are the traditional owners of the land that Everything Open 2025 was held on.
2. Approval of the minutes from the previous Annual General Meeting 2024
MOTION by JOEL ADDISON that the minutes of the Annual General Meeting 2024 of Linux Australia be accepted as complete and accurate.
Seconded by Sae Ra Germaine
Motion Passed with 4 abstentions, no nays
3. To receive the REPORTS of activities of the preceding year from OFFICE BEARERS
MR JOEL ADDISON – President
The full reports is attached to the annual report, so only a few highlights here:
There have been a lot of events over the past year. In fact it has been the busiest Joel has ever been involved with. We had more and more groups come to ask for help in running events. Not every event that was proposed ended up running, but it was great to see so many people come back and run events with us for the benefit of the community.
Special call out for PurpleCon which was a new small event with a relatively inexperienced team who put on a great event, and even managed to make a small profit once GST was properly accounted for.
The WordPress community had some major upheaval during the year which had a flow on effect on WordCamp Sydney. Linux Australia is unhappy with some of the actions that were taken, for example taking over the social media accounts of the conference which Linux Australia would normally control under the Memorandum of Understanding. We will review that to make sure the we are partnered with an organisation we can trust, which we currently have doubts about.
Linux Australia also helped the Python New Zealand user group who faced a significant financial problem caused by their treasurer misappropriating funds. That individual has now been dealt with under the law. Joel is glad to see that chapter behind the organisation. Thanks to both the Linux Australia council and the Python New Zealand committee for their work in resolving the situation.
Finally a special mention for PyCon Au which is now the largest conference run under the Linux Australia banner. A number of members of the Python Australia steering committee stepped back after serving for 10 or more years. Particular thanks to Katie Mch[FIIXME], Russell Keith McGee[CHECK SPELLING], Richard Jones for their hard work over that time
We need to do some work on our constitution. This was flagged last year, but needs to be done now due to some changes in the way the ATO is enforcing non profit status rules. Russell Stuart will speak about this in more detail later. There is a lot of work that will need to be done to update our constitution to make sure we can continue as a non profit.
Questions:
One question on grant applications: What type of grant applications were you hoping for which didn’t happen?
We had two categories of grant applications this year, one for community building and the other for the more traditional grants for projects
The community building grants did not get as much take up as expected. So we need to work on that.and improve our connections to all the groups in our community.
For the standard grants we need to have a discussion about how we track success of the grants. We are keen to support people purchasing hardware or working on software, but we need to do a better job of showing the benefit to the community,
MR NEILL COX – Secretary
Call for people to fill in the attendance sheet.
Again the full report is contained in the Annual Report so just a few highlights here.
It’s been a challenging but rewarding year. Neill is grateful to have been part of the council. We have achieved quite a lot.
As Joel mentioned we ran more conferences than usual. They ran under difficult circumstances and the financial outcomes haven’t been all that we might have hoped for but all in all it’s been a good year.
We have a fair chunk of work ahead of us. Grateful to have been a part of it and looking forward to continuing for the next year.
MR RUSSELL STUART – Treasurer – Includes presentation of the Auditor’s Report
Full report attached to the Annual Report.
Shout out to Sae Ra who is leaving after being part of the council for 10 years.
The bottom line is that we made a loss of $55,000. Those of you who were at the last AGM would remember that a loss was predicted.
The loss almost exactly matches the funds we provided to ensure Python New Zealand continued to operate.
The $55,000 figure is from the auditor. Russell calculates the loss to be $72,000 which he believes is more accurate.
The reason that the figures are different is spelled out in the Annual Report and will not be explained here.
Conference P&L:
The unusual thing this year is that some of our conferences made a loss, mostly due to difficulties with sponsorship.
Everything Open 2024 returned a profit of around $18,000 but the rest didn’t do so well.
The annual report does not include every conference (only four or five are shown) because three of them happened after the end of our financial year. One of the notable absentees is PyCon another is PurpleCon and finally DrupalCon Singapore
DrupalCon Singapore was a new experiment and caused a significant amount of angst, at one stage project a $100,000 loss. That loss did not eventuate due to work by the organisers.
The difference between the auditors figures and Russell’sis those conferences. Largely caused by the conferences having recorded their income in the forms of sponsorship but not having yet paid all their expenses. Russell will book the actual profits once expenses have been paid.
The budget is substantially the same as previous years as our costs do not vary.
Our loss turns out to be pretty much identical to the funds used to bail out NZPUG.
The reason for that is that we made a fair amount of interest. THis was due to investing in a term deposit at a high interest rate for two years. We will receive a similar amount this year before interest rates drop on it.
Our overheads are still higher than the 6% we allow for.
Conferences tend to want to keep their money, but LA feels that we need the funds to allow us to survive years like this last one.
Looking forward all of the conferences have now been bitten by the lack of sponsorship and Russell expects them to adjust course accordingly.
The reality is that the executive of Linux Australia has run the last for Everything Open conferences which is not sustainable. We don’t have anyone putting their hand up to run Everything Open. It is possible that we will lose the $18,000 that Everything Open brings in unless people do put up their hand.
The final item is that we received a surprise letter from the ATO saying that they are introducing a new thing called a not for profit self assessment. It looks to be aligning the ATO’s definition of a not for profit with the Charities and Not for Profit Commission.
One of the questions we need to answer is: what happens to your assets if you wind up?
Our constitution says that the executive could just take it home and buy a new house.
This is no longer acceptable to the ATO and we now need to have a clause defining what happens to our assets or we will no longer qualify as a not for profit.
We have had two attempts to change our constitution both of which were rejected. If we do not resolve this we will be taxed on our profits at the rate of 25%
Instead of attempting this again we will now instead attempt to adopt the new NSW model constitution.
Russell copied the new model constitution but the changes from the one we originally adopted make producing a diff nearly impossible.
Russell has gone through the process of pulling across the things we most care about in our current constitution (free membership, electronic records) and merged them into the new model constitution. This is now up on GitHub as a proposed pull request for our current constitution. We propose to allow everyone to read it and comment on the PR. Sometime before June 20th (the ATO deadline) we will hold a special general meeting to vote on the changes.
Questions
Question from Steve Ellis: The 25% is that on revenue or profits?
Response: Profits. There are also other implications [This is a summary of the answer Russell provided see [33:43 of the recording for the full Q&A]
Question from the floor: Of the eight or so what category of non profit seems appropriate or possible?
Response: It turns out we are a scientific institution [again a summary see the recording at 36:01 for the full answer]
Q: You said that if Everything Open didn’t run next year that you would make an $80,000 loss. How could that happen if you don’t run an event.
A[Russell]: No, we wouldn’t make a loss, but we would miss out on the $18,000 profit which is what it made this year. It ranges between that and $40,000 for the last 27 years. [Full answer at 37:18]
Q from Josh Hesketh: What did we provide to Drupal Singapore – was it banks, insurance or other and would we extend that to other conferences in Singapore and further would we consider other countries in the APAC region?
A[Joel]: With that one it was bank accounts and insurance as you mentioned. It’s not certain that we will continue the agreement with the Drupal Association for future conferences. There can also be tax implications. We will assess future conferences as they come up [Full question and answer at 38:48]
Q: Alexar: Is Linux Australia considering having an impact in the APAC region as a strategy or will this just be a case by case approach?
A[Joel]: We’ve always done stuff across Australia and New Zealand and we have supported some other events in the region. We are not ready to commit to a strategic approach without more investigation. For now we are predominantly focussed on Australia and New Zealand but there are opportunities to work with other organisations in the region.
Follow up Question: Can subcommittees pursue similar opportunities?
Follow up Answer: We always say to our subcommittees feel free to bring any ideas to us and we will discuss them with them and go from there.
Q [Cherie Ellis]: Is there some way that things can be turned or twisted so that it’s still Everything Open but that LCA or the Linux Australia name is bonded to it so that it becomes the recognizable icon that our sponsors know?
A[Joel]: The sponsors we spoke to understand the alignment.The challenge is that sponsors like IBM are no longer operating in the same way anymore in Australia for that particular area. We managed to find a number of new sponsors this year. Every conference has found that a number of recurring sponsors have said no this year because they can’t afford it or they don’t have the budget. We’ve also had a number of sponsors who signed up and then pulled out in the last two weeks prior to a conference. We expect these challenges to continue over the next 12 months, but we are better prepared for them now. As to bringing LCA back one idea that has been considered is to turn the Linux Kernel Miniconf into LCA as part of Everything Open. That was the intent but we haven’t had enough people step up to make it happen. We do have Carlos from the Open SI institute at the University of Canberra who would like to bring Everything Open to Canberra next year.
Q[Steve Ellis]: Thank you for your support of the community. Do we need a group in Linux Australia focused on sponsorship across all of the events?
A[Joel]: Building a pool of organisations is definitely something we need, not just for sponsorship but also to promote the awareness of the events internally in their organisations. We have discussed setting up some sort of central thing. We could set up a working group and go through some of that. Some of the other conferences have also expressed interest.
Q[Paul Wayper]: Question for the Treasurer: How can the community help Everything Open survive from the ticket price perspective?
A [Russell]: I don’t actually set the budget for the conferences. That’s done by the conference treasurers. I try to give them as much freedom as possible beyond “don’t make a loss”. I don’t have an easy answer for your question.
MOTION by Russell Stuart that the Auditor’s Report is a true statement of financial accounts.
Seconded by Steven Ellis
Passed with one abstention
MOTION by Joel Addison that the President’s report is correct.
Seconded by Josh Hesketh
Passed with one abstention
MOTION by Neill Cox that the Secretary’s report is correct.
Seconded byJonathon Woithe
Passed with one abstention
MOTION by Russell Stuart that the Treasurer’s report is correct.
Seconded by Cherie Ellis
Passed with one abstention
MOTION by Joel Addison that the actions of Council during 2024 are endorsed by the membership.
Seconded by Ian
Passed with three abstentions
5. Other Business
Clinton Roy would like to encourage people to engage with the mailing list rather than waiting until the next AGM.
Josh Hesketh would like to give a special thank you to Sae Ra for all her time on the council. Josh believes that she is by a decent margin both the longest serving council member and the most prolific organiser of conferences.
Paul Wayper: Does not want to make trouble, but is interested in talking to the Council about running Everything Open in Canberra in 2026, but would like it on the record that he was not trying to cause trouble by enquiring about videos from Everything Open 2024 on the mailing list.
Joel responds to assure Paul that he didn’t mean anything by his response on the videos, it was just that he had a lot of things to get done. He is still working on them, and they will get done.
Sae Ra: wants to add to the record that that was no slight against Paul it was more just a passing joke and she apologises for that and it had nothing to do with the videos whatsoever.
Peter Chubb requested to move a motion that the meetings of this meeting be approved at the first council meeting rather than waiting all the way to next year when everyone’s forgotten it.
Joel responds: We have to do this as part of the AGM. We can post the minutes before then.
Josh Hesketh: A logistical question of the SGM. Do we have to have a minimum number of members present to make a constitutional change and is there a minimum number of positive votes?
Joel responds: There are but he can’t remember the exact details and won’t answer of the top of his head. Council will send through the details.
6. DECLARATION of Election and WELCOME of incoming Council by the Returning Officer
Returning Officer Julien Goodwin gives his report.
He notes that on examination of the stats Russell and Sae Ra are tied for length of time on the LA Council.
This is Julien’s third time as returning officer.
We have a well-dialed in system now.
The only two things of note are:
Particularly for the OCM positions where this matters, the way second preferences and such work mean that the election is not held in the sort of Senate count style system that Australians might assume.
We didn’t have this situation this year, but it is not uncommon for someone to, for example, stand as both secretary and as an ordinary committee member and if they are elected as a secretary they are not then eliminated by the system for those lower systems so we would then have a hand count. That didn’t apply this year, but we did have the case where for the last OCM position both candidates received the same first count votes so it was down to second count. If it was done with a different system it may have had a different result.
Results
President: Joel Addison
Vice President: Jennifer Cox
Secretary: Neill Cox
Treasurer: Russell Stuart
Ordinary Council Members (alphabetical order):
Lilly Hoi Sze Ho
Elena Williams
Jonathan Woithe
Joel: Thank you for coming along to the AGM. I want to say one more thank you to Sae Ra because you have been a big help to me.
Meeting closed at 19:05 ACDT
AGM Minutes Confirmed by 2025 Linux Australia Council
I have just got a Furilabs FLX1s [1] which is a phone running a modified version of Debian. I want to have a phone that runs all apps that I control and can observe and debug. Android is very good for what it does and there are security focused forks of Android which have a lot of potential, but for my use a Debian phone is what I want.
The FLX1s is not going to be my ideal phone, I am evaluating it for use as a daily-driver until a phone that meets my ideal criteria is built. In this post I aim to provide information to potential users about what it can do, how it does it, and how to get the basic functions working. I also evaluate how well it meets my usage criteria.
I am not anywhere near an average user. I don’t think an average user would ever even see one unless a more technical relative showed one to them. So while this phone could be used by an average user I am not evaluating it on that basis. But of course the features of the GUI that make a phone usable for an average user will allow a developer to rapidly get past the beginning stages and into more complex stuff.
Features
The Furilabs FLX1s [1] is a phone that is designed to run FuriOS which is a slightly modified version of Debian. The purpose of this is to run Debian instead of Android on a phone. It has switches to disable camera, phone communication, and microphone (similar to the Librem 5) but the one to disable phone communication doesn’t turn off Wifi, the only other phone I know of with such switches is the Purism Librem 5.
It has a 720*1600 display which is only slightly better than the 720*1440 display in the Librem 5 and PinePhone Pro. This doesn’t compare well to the OnePlus 6 from early 2018 with 2280*1080 or the Note9 from late 2018 with 2960*1440 – which are both phones that I’ve run Debian on. The current price is $US499 which isn’t that good when compared to the latest Google Pixel series, a Pixel 10 costs $US649 and has a 2424*1080 display and it also has 12G of RAM while the FLX1s only has 8G. Another annoying thing is how rounded the corners are, it seems that round corners that cut off the content are a standard practice nowadays, in my collection of phones the latest one I found with hard right angles on the display was a Huawei Mate 10 Pro which was released in 2017. The corners are rounder than the Note 9, this annoys me because the screen is not high resolution by today’s standards so losing the corners matters.
The default installation is Phosh (the GNOME shell for phones) and it is very well configured. Based on my experience with older phone users I think I could give a phone with this configuration to a relative in the 70+ age range who has minimal computer knowledge and they would be happy with it. Additionally I could set it up to allow ssh login and instead of going through the phone support thing of trying to describe every GUI setting to click on based on a web page describing menus for the version of Android they are running I could just ssh in and run diff on the .config directory to find out what they changed. Furilabs have done a very good job of setting up the default configuration, while Debian developers deserve a lot of credit for packaging the apps the Furilabs people have chosen a good set of default apps to install to get it going and appear to have made some noteworthy changes to some of them.
Droidian
The OS is based on Android drivers (using the same techniques as Droidian [2]) and the storage device has the huge number of partitions you expect from Android as well as a 110G Ext4 filesystem for the main OS.
The first issue with the Droidian approach of using an Android kernel and containers for user space code to deal with drivers is that it doesn’t work that well. There are 3 D state processes (uninterrupteable sleep – which usually means a kernel bug if the process remains in that state) after booting and doing nothing special. My tests running Droidian on the Note 9 also had D state processes, in this case they are D state kernel threads (I can’t remember if the Note 9 had regular processes or kernel threads stuck in D state). It is possible for a system to have full functionality in spite of some kernel threads in D state but generally it’s a symptom of things not working as well as you would hope.
The design of Droidian is inherently fragile. You use a kernel and user space code from Android and then use Debian for the rest. You can’t do everything the Android way (with the full OS updates etc) and you also can’t do everything the Debian way. The TOW Boot functionality in the PinePhone Pro is really handy for recovery [3], it allows the internal storage to be accessed as a USB mass storage device. The full Android setup with ADB has some OK options for recovery, but part Android and part Debian has less options. While it probably is technically possible to do the same things in regard to OS repair and reinstall the fact that it’s different from most other devices means that fixes can’t be done in the same way.
Applications
GUI
The system uses Phosh and Phoc, the GNOME system for handheld devices. It’s a very different UI from Android, I prefer Android but it is usable with Phosh.
IM
Chatty works well for Jabber (XMPP) in my tests. It supports Matrix which I didn’t test because I don’t desire the same program doing Matrix and Jabber and because Matrix is a heavy protocol which establishes new security keys for each login so I don’t want to keep logging in on new applications.
Chatty also does SMS but I couldn’t test that without the SIM caddy.
I use Nheko for Matrix which has worked very well for me on desktops and laptops running Debian.
Email
I am currently using Geary for email. It works reasonably well but is lacking proper management of folders, so I can’t just subscribe to the important email on my phone so that bandwidth isn’t wasted on less important email (there is a GNOME gitlab issue about this – see the Debian Wiki page about Mobile apps [4]).
Music
Music playing isn’t a noteworthy thing for a desktop or laptop, but a good music player is important for phone use. The Lollypop music player generally does everything you expect along with support for all the encoding formats including FLAC0 – a major limitation of most Android music players seems to be lack of support for some of the common encoding formats. Lollypop has it’s controls for pause/play and going forward and backward one track on the lock screen.
Maps
The installed map program is gnome-maps which works reasonably well. It gets directions via the Graphhopper API [5]. One thing we really need is a FOSS replacement for Graphhopper in GNOME Maps.
Delivery and Unboxing
I received my FLX1s on the 13th of Jan [1]. I had paid for it on the 16th of Oct but hadn’t received the email with the confirmation link so the order had been put on hold. But after I contacted support about that on the 5th of Jan they rapidly got it to me which was good. They also gave me a free case and screen protector to apologise, I don’t usually use screen protectors but in this case it might be useful as the edges of the case don’t even extend 0.5mm above the screen. So if it falls face down the case won’t help much.
When I got it there was an open space at the bottom where the caddy for SIMs is supposed to be. So I couldn’t immediately test VoLTE functionality. The contact form on their web site wasn’t working when I tried to report that and the email for support was bouncing.
Bluetooth
As a test of Bluetooth I connected it to my Nissan LEAF which worked well for playing music and I connected it to several Bluetooth headphones. My Thinkpad running Debian/Trixie doesn’t connect to the LEAF and to headphones which have worked on previous laptops running Debian and Ubuntu. A friend’s laptop running Debian/Trixie also wouldn’t connect to the LEAF so I suspect a bug in Trixie, I need to spend more time investigating this.
Wifi
Currently 5GHz wifi doesn’t work, this is a software bug that the Furilabs people are working on. 2.4GHz wifi works fine. I haven’t tested running a hotspot due to being unable to get 4G working as they haven’t yet shipped me the SIM caddy.
Docking
This phone doesn’t support DP Alt-mode or Thunderbolt docking so it can’t drive an external monitor. This is disappointing, Samsung phones and tablets have supported such things since long before USB-C was invented. Samsung DeX is quite handy for Android devices and that type feature is much more useful on a device running Debian than on an Android device.
Camera
The camera works reasonably well on the FLX1s. Until recently for the Librem 5 the camera didn’t work and the camera on my PinePhone Pro currently doesn’t work. Here are samples of the regular camera and the selfie camera on the FLX1s and the Note 9. I think this shows that the camera is pretty decent. The selfie looks better and the front camera is worse for the relatively close photo of a laptop screen – taking photos of computer screens is an important part of my work but I can probably work around that.
I wasn’t assessing this camera t find out if it’s great, just to find out if I have the sorts of problems I had before and it just worked. The Samsung Galaxy Note series of phones has always had decent specs including good cameras. Even though the Note 9 is old comparing to it is a respectable performance. The lighting was poor for all photos.
FLX1s
Note 9
Power Use
In 93 minutes having the PinePhone Pro, Librem 5, and FLX1s online with open ssh sessions from my workstation the PinePhone Pro went from 100% battery to 26%, the Librem 5 went from 95% to 69%, and the FLX1s went from 100% to 99%. The battery discharge rate of them was reported as 3.0W, 2.6W, and 0.39W respectively. Based on having a 16.7Wh battery 93 minutes of use should have been close to 4% battery use, but in any case all measurements make it clear that the FLX1s will have a much longer battery life. Including the measurement of just putting my fingers on the phones and feeling the temperature (FLX1s felt cool and the others felt hot).
The PinePhone Pro and the Librem 5 have an optional “Caffeine mode” which I enabled for this test, without that enabled the phone goes into a sleep state and disconnects from Wifi. So those phones would use much less power with caffeine mode enabled, but they also couldn’t get fast response to notifications etc. I found the option to enable a Caffeine mode switch on the FLX1s but the power use was reported as being the same both with and without it.
Charging
One problem I found with my phone is that in every case it takes 22 seconds to negotiate power. Even when using straight USB charging (no BC or PD) it doesn’t draw any current for 22 seconds. When I connect it it will stay at 5V and varying between 0W and 0.1W (current rounded off to zero) for 22 seconds or so and then start charging. After the 22 second display the phone will make the tick sound indicating that it’s charging and the power meter will measure that it’s drawing some current.
I added the table from my previous post about phone charging speed [6] with an extra row for the FLX1s. For charging from my PC USB ports the results were the worst ever, the port that does BC did not work at all it was looping trying to negotiate after a 22 second negotiation delay the port would turn off. The non-BC port gave only 2.4W which matches the 2.5W given by the spec for a “High-power device” which is what that port is designed to give. In a discussion on the Purism forum about the Librem5 charging speed one of their engineers told me that the reason why their phone would draw 2A from that port was because the cable was identifying itself as a USB-C port not a “High-power device” port. But for some reason out of the 7 phones I tested the FLX1s and the One Plus 6 are the only ones to limit themselves to what the port is apparently supposed to do. Also the One Plus 6 charges slowly on every power supply so I don’t know if it is obeying the spec or just sucking.
On a cheap AliExpress charger the FLX1s gets 5.9V and on a USB battery it gets 5.8V. Out of all 42 combinations of device and charger I tested these were the only ones to involve more than 5.1V but less than 9V. I welcome comments suggesting an explanation.
The case that I received has a hole for the USB-C connector that isn’t wide enough for the plastic surrounds on most of my USB-C cables (including the Dell dock). Also to make a connection requires a fairly deep insertion (deeper than the One Plus 6 or the Note 9). So without adjustment I have to take the case off to charge it. It’s no big deal to adjust the hole (I have done it with other cases) but it’s an annoyance.
Phone
Top z640
Bottom Z640
Monitor
Ali Charger
Dell Dock
Battery
Best
Worst
FLX1s
FAIL
5.0V 0.49A 2.4W
4.8V 1.9A 9.0W
5.9V 1.8A 11W
4.8V 2.1A 10W
5.8V 2.1A 12W
5.8V 2.1A 12W
5.0V 0.49A 2.4W
Note9
4.8V 1.0A 5.2W
4.8V 1.6A 7.5W
4.9V 2.0A 9.5W
5.1V 1.9A 9.7W
4.8V 2.1A 10W
5.1V 2.1A 10W
5.1V 2.1A 10W
4.8V 1.0A 5.2W
Pixel 7 pro
4.9V 0.80A 4.2W
4.8V 1.2A 5.9W
9.1V 1.3A 12W
9.1V 1.2A 11W
4.9V 1.8A 8.7W
9.0V 1.3A 12W
9.1V 1.3A 12W
4.9V 0.80A 4.2W
Pixel 8
4.7V 1.2A 5.4W
4.7V 1.5A 7.2W
8.9V 2.1A 19W
9.1V 2.7A 24W
4.8V 2.3A 11.0W
9.1V 2.6A 24W
9.1V 2.7A 24W
4.7V 1.2A 5.4W
PPP
4.7V 1.2A 6.0W
4.8V 1.3A 6.8W
4.9V 1.4A 6.6W
5.0V 1.2A 5.8W
4.9V 1.4A 5.9W
5.1V 1.2A 6.3W
4.8V 1.3A 6.8W
5.0V 1.2A 5.8W
Librem 5
4.4V 1.5A 6.7W
4.6V 2.0A 9.2W
4.8V 2.4A 11.2W
12V 0.48A 5.8W
5.0V 0.56A 2.7W
5.1V 2.0A 10W
4.8V 2.4A 11.2W
5.0V 0.56A 2.7W
OnePlus6
5.0V 0.51A 2.5W
5.0V 0.50A 2.5W
5.0V 0.81A 4.0W
5.0V 0.75A 3.7W
5.0V 0.77A 3.7W
5.0V 0.77A 3.9W
5.0V 0.81A 4.0W
5.0V 0.50A 2.5W
Best
4.4V 1.5A 6.7W
4.6V 2.0A 9.2W
8.9V 2.1A 19W
9.1V 2.7A 24W
4.8V 2.3A 11.0W
9.1V 2.6A 24W
Conclusion
The Furilabs support people are friendly and enthusiastic but my customer experience wasn’t ideal. It was good that they could quickly respond to my missing order status and the missing SIM caddy (which I still haven’t received but believe is in the mail) but it would be better if such things just didn’t happen.
The phone is quite user friendly and could be used by a novice.
I paid $US577 for the FLX1s which is $AU863 by today’s exchange rates. For comparison I could get a refurbished Pixel 9 Pro Fold for $891 from Kogan (the major Australian mail-order company for technology) or a refurbished Pixel 9 Pro XL for $842. The Pixel 9 series has security support until 2031 which is probably longer than you can expect a phone to be used without being broken. So a phone with a much higher resolution screen that’s only one generation behind the latest high end phones and is refurbished will cost less. For a brand new phone a Pixel 8 Pro which has security updates until 2030 costs $874 and a Pixel 9A which has security updates until 2032 costs $861.
Doing what the Furilabs people have done is not a small project. It’s a significant amount of work and the prices of their products need to cover that. I’m not saying that the prices are bad, just that economies of scale and the large quantity of older stock makes the older Google products quite good value for money. The new Pixel phones of the latest models are unreasonably expensive. The Pixel 10 is selling new from Google for $AU1,149 which I consider a ridiculous price that I would not pay given the market for used phones etc. If I had a choice of $1,149 or a “feature phone” I’d pay $1,149. But the FLX1s for $863 is a much better option for me. If all I had to choose from was a new Pixel 10 or a FLX1s for my parents I’d get them the FLX1s.
For a FOSS developer a FLX1s could be a mobile test and development system which could be lent to a relative when their main phone breaks and the replacement is on order. It seems to be fit for use as a commodity phone. Note that I give this review on the assumption that SMS and VoLTE will just work, I haven’t tested them yet.
The UI on the FLX1s is functional and easy enough for a new user while allowing an advanced user to do the things they desire. I prefer the Android style and the Plasma Mobile style is closer to Android than Phosh is, but changing it is something I can do later. Generally I think that the differences between UIs matter more when on a desktop environment that could be used for more complex tasks than on a phone which limits what can be done by the size of the screen.
I am comparing the FLX1s to Android phones on the basis of what technology is available. But most people who would consider buying this phone will compare it to the PinePhone Pro and the Librem 5 as they have similar uses. The FLX1s beats both those phones handily in terms of battery life and of having everything just work. But it has the most non free software of the three and the people who want the $2000 Librem 5 that’s entirely made in the US won’t want the FLX1s.
This isn’t the destination for Debian based phones, but it’s a good step on the way to it and I don’t think I’ll regret this purchase.
Meeting opened at 20:04 AEDT by Joel and quorum was achieved.
Minutes taken by Neill.
2. Log of correspondence
Membership subscription page/invoice and a couple of questions – Testing of the new Joomla membership form. Joel and Russell have responded
LA meeting update. – report form Miles on the EO2026 conference. Internal EO2026 email
Adding David Sparks to ANZ in NZ bank accounts – Discussion about adding people to LA’s ANZ New Zealand bank accounts. Includes and on list motion.
Status of subcommittee reports for LA Annual Report – from Jonathan Woithe – updates on subcommittee reports received
You’ve Made A Sale – 76337027 (AU$4.33) – from RedBubble
Re: Receiving funds – question from Arjen about receiving funds for conferences. Russell has responded.
Error when updating mailing list general info pages – request for assistance from Jack Skinner. Joel has forwarded the request to the admin team.
admin team reimbursements – 2025 admin team expense claim. Russell has responded and the claim has been paid.
Annual report – Media and Comms report – Kathy Reid has sent in the Media and Comms report for the Annual Report
Linux Australia website – does your event or area need any content updates made? If so, please let me know what they are – Kathy Reid has asked the event subcommittees for any needed updates
A better way to make event linux.org.au website updates – can we make a process change? – Kathy Reid has suggested a process change for website updates. Jonathan has responded.
PyCon AU announce + committee mailing lists not sending. – Jack enquires about the mailing lists. Joel and Steve have responded.
Request for Payment Approval: PyCon AU – Lemonnade – Russell has responded and the payment has been approved
Celebrating Generosity and Growth in the OSI Community – newsletter from OSI
Annual Linux Australia subcommittee report – Jonathan has sent a request to the PyConAU Steering Committee for their 2025 report.
Fwd: [Pyconau-ctte] Annual Linux Australia subcommittee report – Question from Jack Skinner as to whether council would like a PyConAU26 report for the Annual report. Jonathan has responded.
Fwd: Clearance Employer Notification – EO have received a Working With Children check for someone who is probably a member of Linux Victoria – they are definitely not an EO2026 volunteer. Sae Ra asks for help in clarifying the situation
General note: We continue to receive a lot of spam offering to provide us with a list of attendees to conferences we run. Currently it’s for EO2026 which hasn’t even run yet. I continue to delete these messages.
3. Items for discussion
Working with children check
Everything Open was notified that a volunteer has been approved for a working with children check. The council will check whether they deliberately used Linux Australia as they are not volunteering with Everything Open or directly with Linux Australia.
Access to Policies and Linux Aus mailing lists to handle moderation requests for secretary – Joel will share the password after this meeting.
4. Items for noting
Election is ready to go. Joel will contact Julien about being the scrutineer/returning officer.
Membership backlog is pretty much cleared. Several people responded to the pre-election reminder, but many have not. I intend to cancel those who have not responded.
Russell has moved a motion on the council mailing list to add David Sparks and Christopher Burgess to the anz.co.nz mandate.
MOTION: Linux Australia adds Dave Sparks and Christopher Burgess to its anz.co.nz mandate as payment authorisers.
I seek a seconder and votes on the motion.
I vote in favour of the motion.
Seconded by Jonathan Woithe Results: Motion passed
5. Other business
Annual Report The content is ready, Jenny will look at starting the final layout this weekend. We will aim to have the report finalised by 7 Jan 2026.
2019-2021 PO BOX paperwork – Joel has responded to Julien
Linux Australia’s 2024/2025 auditLinux Australia’s 2024/2025 audit – Russell has requested that the auditor start work on 2024/25 audit
Probleme with Ubuntu Manager–kernel version 6.8.0-87-generic, ubuntu 24.04.3LTS – Jonathan has responded
Important: Scam Messages Requesting Account Verification – notification from RedBubble
Fwd: Linux Australia’s 2024/2025 audit – email from Russell asking various subcommittee treasurers to tidy up some loose ends in Xero.
Authorisation for Kiwi PyCon Transaction Exceeding $5000 – 2025-11-26 – Russell has responded
Fwd: Can you post the JC Guzman announcement to social media? [Internal EOCBR26 email]
On-campus accommodation for core team [Internal EOCBR26 email]
Several notifications from RedBubble about EO2026 merch sales.
Access to PyCon AU 2025 stripe account – request from Jack Skinner for access to stripe to finalise PyConAU 25 books in Xero – Russell has responded
Finalising PyCon AU 2025 accounts – Discussion between Jack Skinner and Russell about finalising the PyConAU 2025 financial position.
Status of subcommittee reports for LA Annual Report – report from Jonathan on the status of subcommittee reports for the Annual Report. Still waiting on Kiwi PyCon, Admin, Flounder and LUV
Catering selections – part 2 -Penguin dinner [Internal EOCBR26 email]
LA third party maintenance – Wednesday 3rd Dec – notification from the admin team of planned maintenance that will affect LA’s servers’ internet connection.
Tuesday 2-Dec’s Everything Open meeting agenda [Internal EOCBR26 email]
.au Licensing Rules Review for auDA – notification from Sae Ra that auDA are doing a licensing rules review.
Amazon Web Services Billing Statement Available [Account: 103334912252] for $7.47
Membership subscription page/invoice and a couple of questions – questions from Nathan Morrow for Russell about payment gateways for Joomla.org.au membership
Apologies – Julia is unable to attend tonight’s subcommittee meeting
PyCon AU 2026 update for 3 December 2025 – Update from Jack Skinner on PyConAU2026
Amusing spam:
Invitation to join the Illuminati Order of Poseidon
3. Items for discussion
Kiwi Pycon Subcommittee Update (20:05 AEDT)
190 attendees over three days
Financials significantly better than expected
$2,000 – $4,000 loss after LA taxes
Community success. Two days of great talks
Took a lot out of the organisers who are going to take a break next year
Will be looking for new organisers
Joel thanked the organisers on behalf of the council, and acknowledged the good result – it may be a loss, but it is still a good result and the result of a lot of work.
KiwiPycon has a new budget spreadsheet for Russell to look at. It may be a better way of tracking profile and loss.
While the organisers are thinking of leaving a bit of a gap before the next conference they are already planning to communicate with possible future sponsors.
Drupal Subcommittee Update (20:10 AEDT)
DrupalCon Asia
Final budget not yet complete
214 people
Post event survey overwhelmingly positive
Suspiciously few things went wrong
50% of attendees from Japan
A local Drupal association has formed.
Currently wrapping up the final invoices.
Final update in February
Michael is resigning from the steering committee.
A new chair and two new members will be elected.
Next event in India in 2027, but need to get local sponsors to step up and commit. This hasn’t happened yet.
The Pune group runs a 250 person event for AUD4,000 so the economics are quite different.
No event in 2026.
The treasure hunt only had 36 people in 14 teams b ut it was a wonderful community event. Check the flickr stream!
Michael had a lot of fun putting the event together but will never do it again (a lot more work than expected)
These two events (Singapore and Japan) have had a major positive effect on the Drupal community in Asia.
Joomla Subcommittee Update (20:15 AEDT)
Most of these details are in the email to Jonathan for the annual report
Have an event planned for next year in Melbourne.
A membership form has been set up. Will be made live this week before the meeting next Tuesday.
Still some work to do on the backend.
Work on the website is ongoing, probably won’t be launched until next year.
Question about qualifying as a NFP in Victoria. LA is qualified by the ATO nationally so there should be no problem in Victoria.
Admin Team Update (20:20 AEDT)
Not a lot to report
Still poking at Russell Coker’s complaints. Russell would like zero spam. Steve is not sure that this is possible. Now down to 1 – 1.5%. Some tightening can still be done but at some point valid email will be rejected.
The annual report is nearly done.
Expense claims in progress
Budget is almost complete
Steve will respond to Russell at some point.
Some disruptive maintenance about to start. Steve will be monitoring and will report any problems.
PyCon AU Subcommittee Update (20:25 AEDT)
2025
Basically done.
Annual report sent in
Jack has tidied up the accounts.
The last thing to do is the PSF grant accountability report.
This is Peter’s last meeting
Joel thanked Peter on behalf of the council.
2026
Full team has been filled
Ticket sales are continuing, no major sales effort has been made yet.
KiwiPyCon has recommended that members attend Brisbane.
Will there be an upcoming events page in the annual report? Answer – no. The annual report will focus on the past year.
There will probably not be another subcommittee meeting before the election. The hope is that the new council will be up and running by early February.
PyConAU 2026 would like to have a banner at EO2026.
WordPress Subcommittee Update (20:30 AEDT)
No report
Everything Open Subcommittee Update (20:35 AEDT)
The financial situation is similar to last time. Currently looking at a $10,000 profit.
Merch is online at RedBubble. T-shirt design is done.
One more keynote to announce.
Signed sponsorship contracts are coming in. Most are now signed.
Working on getting volunteers and speakers to finish registering on the website so dietary requirements can be communicated to UC.
Moving to a weekly meeting cadence. Notionally three more this year, but the 23rd will probably be fairly casual.
Speaker and keynote gifts are being finalised.
Currently 107 registrations, including speakers, organisers and volunteers.
At this point EO is past the break even point.
Flounder Subcommittee Update (20:35 AEDT)
No report
LUV Subcommittee Update (20:35 AEDT)
No report
4. Items for noting
Membership backlog:
Emails have now been sent to all pending/awaiting approval members who have applied since January 2025. There are another 105 applications dated from between November 2021 and December 2024. [Joel suggests checking for obvious community members and declining the rest. Neill will come up with a nice way of telling them to reapply if that are still interested
28 new members have responded and been approved. I’m waiting on a reply from 77 others.
CiviCRM reports 1003 current members.
Daniel Pocock has applied for membership – council will not approve this application
5. Other business
NFP papers have been submitted to the ATO
Annual report submissions from subcommittees are nearly completed, the remaining ones should be submitted by Sunday.
Subcommittee policies need to be clarified and then enforced – this is a job for the next council but we should begin preparations. We probably need a register of subcommittees and other affiliated organisations.
AGM will be held during Everything Open – probably on the first evening of the conference which will be a Wednesday night.
The election should be called by When do we need to call the election?
If you use of any of these gems, I do recommend you check out the release notes. There really are some useful fixes there.
I pushed out all of these releases using our new release-machine, which allowed me to straighten out a last couple of kinks. The upside of this is that we saw forum posts go out for each release. So if you’re looking for a convenient way of keeping up with every new Hanami or Dry release, go subscribe to the releases category in our forum.
I continued my rollout of repo-sync into the Hanami repos. This week I got the remainder of hanami/hanami’s files under sync, in twosteps. Then I moved onto hanami-cli, our next most challenging repo, due to its use of different database services in its tests. This one I managed to do in one hit (after preparing the ground first).
There’s a tension to manage in how we handle slight variations in CI setup for each repo. As much as possible we want things to be consistent, but I also don’t want to end up with our own repo-sync.yml config becoming a full mirror of GitHub Actions’ own schema. For Hanami CLI I decided to tackle its usage of custom services: by allowing the repo to define a couple of yaml snippets that get inserted into the appropriate place. This all became possible because I switched our templating to Gomplate last month, which provides handy functions like file.Exists and file.Read. We’re using them both for these CI snippets, but also to allow custom README contents per repo.
The other main change I made for bringing Hanami into our repo sync is to furtherrelax our RuboCop config. I took quite some satisfaction from removing a range of # rubocop:disable statements from across our repos.
All of this is really to say I didn’t get back to any work on the new site. I made the call on Friday morning that these Dry gem releases were too important to leave. I didn’t want those 16 amazing people to see their work waiting around any longer!
This week, however, is the week. I’ll be sending out a post to rally the troops, then set about finishing our marketing copy. The goal is to ship this thing mid-Feb. Countdown starts now.
I just read this informative article on ANSI terminal security [1]. The author has written a tool named vt-houdini for testing for these issues [2]. They used to host an instance on their server but appear to have stopped it. When you run that tool you can ssh to the system in question and without needing a password you are connected and the server probes your terminal emulator for vulnerabilities. The versions of Kitty and Konsole in Debian/Trixie have just passed those tests on my system.
This will always be a potential security problem due to the purpose of a terminal emulator. A terminal emulator will often display untrusted data and often data which is known to come from hostile sources (EG logs of attempted attacks). So what could be done in this regard?
Memory Protection
Due to the complexity of terminal emulation there is the possibility of buffer overflows and other memory management issues that could be used to compromise the emulator.
The Fil-C compiler is an interesting project [3], it compiles existing C/C++ code with memory checks. It is reported to have no noticeable impact on the performance of the bash shell which sounds like a useful option to address some of these issues as shell security issues are connected to terminal security issues. The performance impact on a terminal emulator would be likely to be more noticeable. Also note that Fil-C compilation apparently requires compiling all libraries with it, this isn’t a problem for bash as the only libraries it uses nowadays are libtinfo and libc. The kitty terminal emulator doesn’t have many libraries but libpython is one of them, it’s an essential part of Kitty and it is a complex library to compile in a different way. Konsole has about 160 libraries and it isn’t plausible to recompile so many libraries at this time.
Choosing a terminal emulator that has a simpler design might help in this regard. Emulators that call libraries for 3D effects etc and native support for displaying in-line graphics have a much greater attack surface.
Access Control
A terminal emulator could be run in a container to prevent it from doing any damage if it is compromised. But the terminal emulator will have full control over the shell it runs and if the shell has access needed to allow commands like scp/rsync to do what is expected of them then that means that no useful level of containment is possible.
It would be possible to run a terminal emulator in a container for the purpose of connecting to an insecure or hostile system and not allow scp/rsync to/from any directory other than /tmp (or other directories to use for sharing files). You could run “exec ssh $SERVER” so the terminal emulator session ends when the ssh connection ends.
Conclusion
There aren’t good solutions to the problems of terminal emulation security. But testing every terminal emulator with vt-houdini and fuzzing the popular ones would be a good start.
Qubes level isolation will help things in some situations, but if you need to connect to a server with privileged access to read log files containing potentially hostile data (which is a common sysadmin use case) then there aren’t good options.
One issue is that as there isn’t a lot of weight in the front with the batteries in the centre of the car the front wheels slip easily when accelerating. It’s a minor thing but a good reason for wanting AWD in an electric car.
When I got the car I got two charging devices, the one to charge from a regular 240V 10A power point (often referred to as a “granny charger”) and a cable with a special EV charging connector on each end. The cable with an EV connector on each end is designed for charging that’s faster than the “granny charger” but not as fast as the rapid chargers which have the cable connected to the supply so the cable temperature can be monitored and/or controlled. That cable can be used if you get a fast charger setup at your home (which I never plan to do) and apparently at some small hotels and other places with home-style EV charging. I’m considering just selling that cable on ebay as I don’t think I have any need to personally own a cable other than the “granny charger”.
The key fob for the LEAF has a battery installed, it’s either CR2032 or CR2025 – mine has CR2025. Some reports on the Internet suggest that you can stuff a CR2032 battery in anyway but that didn’t work for me as the thickness of the battery stopped some of the contacts from making a good connection. I think I could have got it going by putting some metal in between but the batteries aren’t expensive enough to make it worth the effort and risk. It would be nice if I could use batteries from my stockpile of CR2032 batteries that came from old PCs but I can afford to spend a few dollars on it.
My driveway is short and if I left the charger out it would be visible from the street and at risk of being stolen. I’m thinking of chaining the charger to a tree and having some sort of waterproof enclosure for it so I don’t have to go to the effort of taking it out of the boot every time I use it. Then I could also configure the car to only charge during the peak sunlight hours when the solar power my home feeds into the grid has a negative price (we have so much solar power that it’s causing grid problems).
The cruise control is a pain to use, so much so that I haven’t yet got it to work usefully ever. The features look good in the documentation but in practice it’s not as good as the Kia one I’ve used previously where I could just press one button to turn it on, another button to set the current speed as the cruise control speed, and then just have it work.
The electronic compass built in to the dash turned out to be surprisingly useful. I regret not gluing a compass to the dash of previous cars. One example is when I start google navigation for a journey and it says “go South on street X” and I need to know which direction is South so I don’t start in the wrong direction. Another example is when I know that I’m North of a major road that I need to take to get to my destination so I just need to go roughly South and that is enough to get me to a road I recognise.
In the past when there is a bird in the way I don’t do anything different, I keep driving at the same speed and rely on the bird to see me and move out of the way. Birds have faster reactions than humans and have evolved to move at the speeds cars travel on all roads other than freeways, also birds that are on roads are usually ones that have an eye in each side of their head so they can’t not see my car approaching. For decades this has worked, but recently a bird just stood on the road and got squashed. So I guess that I should honk when there’s birds on the road.
Generally everything about the car is fine and I’m happy to keep driving it.
What I want to do is get an acceptable ratio of charge time to use time for a free software phone. I don’t necessarily object to a phone that can’t last an 8 hour day on a charge, but I can’t use a phone that needs to be on charge for 4 hours during the day. For this part I’m testing the charge speed and will test the discharge speed when I have solved some issues with excessive CPU use.
I tested with a cheap USB power monitoring device that is inline between the power cable and the phone. The device has no method of export so I just watched it and when the numbers fluctuated I tried to estimate the average. I only give the results to two significant digits which is about all the accuracy that is available, as I copied the numbers separately the V*A might not exactly equal the W. I idly considered rounding off Voltages to the nearest Volt and current to the half amp but the way the PC USB ports have voltage drop at higher currents is interesting.
This post should be useful for people who want to try out FOSS phones but don’t want to buy the range of phones and chargers that I have bought.
Phones Tested
I have seen claims about improvements with charging speed on the Librem 5 with recent updates so I decided to compare a number of phones running Debian/Trixie as well as some Android phones. I’m comparing an old Samsung phone (which I tried running Droidian on but is now on Android) and a couple of Pixel phones with the three phones that I currently have running Debian for charging.
Chargers Tested
HP Z640
The Librem 5 had problems with charging on a port on the HP ML110 Gen9 I was using as a workstation. I have sold the ML110 and can’t repeat that exact test but I tested on the HP z640 that I use now. The z640 is a much better workstation (quieter and better support for audio and other desktop features) and is also sold as a workstation.
The z640 documentation says that of the front USB ports the top one can do “fast charge (up to 1.5A)” with “USB Battery Charging Specification 1.2”. The only phone that would draw 1.5A on that port was the Librem 5 but the computer would only supply 4.4V at that current which is poor. For every phone I tested the bottom port on the front (which apparently doesn’t have USB-BC or USB-PD) charged at least as fast as the top port and every phone other than the OP6 charged faster on the bottom port. The Librem 5 also had the fastest charge rate on the bottom port. So the rumours about the Librem 5 being updated to address the charge speed on PC ports seem to be correct.
The Wikipedia page about USB Hardware says that the only way to get more than 1.5A from a USB port while operating within specifications is via USB-PD so as USB 3.0 ports the bottom 3 ports should be limited to 5V at 0.9A for 4.5W. The Librem 5 takes 2.0A and the voltage drops to 4.6V so that gives 9.2W. This shows that the z640 doesn’t correctly limit power output and the Librem 5 will also take considerably more power than the specs allow. It would be really interesting to get a powerful PSU and see how much power a Librem 5 will take without negotiating USB-PD and it would also be interesting to see what happens when you short circuit a USB port in a HP z640. But I recommend not doing such tests on hardware you plan to keep using!
Of the phones I tested the only one that was within specifications on the bottom port of the z640 was the OP6. I think that is more about it just charging slowly in every test than conforming to specs.
Monitor
The next test target is my 5120*2160 Kogan monitor with a USB-C port [4]. This worked quite well and apart from being a few percent slower on the PPP it outperformed the PC ports for every device due to using USB-PD (the only way to get more than 5V) and due to just having a more powerful PSU that doesn’t have a voltage drop when more than 1A is drawn.
Ali Charger
The Ali Charger is a device that I bought from AliExpress is a 240W GaN charger supporting multiple USB-PD devices. I tested with the top USB-C port that can supply 100W to laptops.
The Librem 5 has charging going off repeatedly on the Ali charger and doesn’t charge properly. It’s also the only charger for which the Librem 5 requests a higher voltage than 5V, so it seems that the Librem 5 has some issues with USB-PD. It would be interesting to know why this problem happens, but I expect that a USB signal debugger is needed to find that out. On AliExpress USB 2.0 sniffers go for about $50 each and with a quick search I couldn’t see a USB 3.x or USB-C sniffer. So I’m not going to spend my own money on a sniffer, but if anyone in Melbourne Australia owns a sniffer and wants to visit me and try it out then let me know. I’ll also bring it to Everything Open 2026.
Generally the Ali charger was about the best charger from my collection apart from the case of the Librem 5.
Dell Dock
I got a number of free Dell WD15 (aka K17A) USB-C powered docks as they are obsolete. They have VGA ports among other connections and for the HDMI and DisplayPort ports it doesn’t support resolutions higher than FullHD if both ports are in use or 4K if a single port is in use. The resolutions aren’t directly relevant to the charging but it does indicate the age of the design.
The Dell dock seems to not support any voltages other than 5V for phones and 19V (20V requested) for laptops. Certainly not the 9V requested by the Pixel 7 Pro and Pixel 8 phones. I wonder if not supporting most fast charging speeds for phones was part of the reason why other people didn’t want those docks and I got some for free. I hope that the newer Dell docks support 9V, a phone running Samsung Dex will display 4K output on a Dell dock and can productively use a keyboard and mouse. Getting equivalent functionality to Dex working properly on Debian phones is something I’m interested in.
Battery
The “Battery” I tested with is a Chinese battery for charging phones and laptops, it’s allegedly capable of 67W USB-PD supply but so far all I’ve seen it supply is 20V 2.5A for my laptop. I bought the 67W battery just in case I need it for other laptops in future, the Thinkpad X1 Carbon I’m using now will charge from a 30W battery.
There seems to be an overall trend of the most shonky devices giving the best charging speeds. Dell and HP make quality gear although my tests show that some HP ports exceed specs. Kogan doesn’t make monitors, they just put their brand on something cheap. Buying one of the cheapest chargers from AliExpress and one of the cheaper batteries from China I don’t expect the highest quality and I am slightly relieved to have done enough tests with both of those that a fire now seems extremely unlikely. But it seems that the battery is one of the fastest charging devices I own and with the exception of the Librem 5 (which charges slowly on all ports and unreliably on several ports) the Ali charger is also one of the fastest ones. The Kogan monitor isn’t far behind.
Conclusion
Voltage and Age
The Samsung Galaxy Note 9 was released in 2018 as was the OP6. The PPP was first released in 2022 and the Librem 5 was first released in 2020, but I think they are both at a similar technology level to the Note 9 and OP6 as the companies that specialise in phones have a pipeline for bringing new features to market.
The Pixel phones are newer and support USB-PD voltage selection while the other phones either don’t support USB-PD or support it but only want 5V. Apart from the Librem 5 which wants a higher voltage but runs it at a low current and repeatedly disconnects.
Idle Power
One of the major problems I had in the past which prevented me from using a Debian phone as my daily driver is the ratio of idle power use to charging power. Now that the phones seem to charge faster if I can get the idle power use under control then it will be usable.
Currently the Librem 5 running Trixie is using 6% CPU time (24% of a core) while idle and the screen is off (but “Caffeine” mode is enabled so no deep sleep). On the PPP the CPU use varies from about 2% and 20% (12% to 120% of one core), this was mainly plasmashell and kwin_wayland. The OP6 has idle CPU use a bit under 1% CPU time which means a bit under 8% of one core.
The Librem 5 and PPP seem to have configuration issues with KDE Mobile and Pipewire that result in needless CPU use. With those issues addressed I might be able to make a Librem 5 or PPP a usable phone if I have a battery to charge it.
The OP6 is an interesting point of comparison as a Debian phone but is not a viable option as a daily driver due to problems with VoLTE and also some instability – it sometimes crashes or drops off Wifi.
The Librem 5 charges at 9.2W from a a PC that doesn’t obey specs and 10W from a battery. That’s a reasonable charge rate and the fact that it can request 12V (unsuccessfully) opens the possibility to potential higher charge rates in future. That could allow a reasonable ratio of charge time to use time.
The PPP has lower charging speeds then the Librem 5 but works more consistently as there was no charger I found that wouldn’t work well with it. This is useful for the common case of charging from a random device in the office. But the fact that the Librem 5 takes 10W from the battery while the PPP only takes 6.3W would be an issue if using the phone while charging.
Now I know the charge rates for different scenarios I can work on getting the phones to use significantly less power than that on average.
Specifics for a Usable Phone
The 57W battery or something equivalent is something I think I will always need to have around when using a PPP or Librem 5 as a daily driver.
The ability to charge fast while at a desk is also an important criteria. The charge speed of my home PC is good in that regard and the charge speed of my monitor is even better. Getting something equivalent at a desktop of an office I work in is a possibility.
Improving the Debian distribution for phones is necessary. That’s something I plan to work on although the code is complex and in many cases I’ll have to just file upstream bug reports.
I have also ordered a FuriLabs FLX1s [5] which I believe will be better in some ways. I will blog about it when it arrives.
I am not entirely sure what makes the AI debate so polarizing, although I suspect it has something to do with feeling threatened by a changing landscape. What I can say with certainty is that I find having a nuanced conversation about AI with people often difficult. It seems to me that people fall into two polar opposite camps — those who thing AI is completely great and that we should fire all the programmers and creatives; and those who think that AI is all bad and we should go backwards in time to a place before it existed.
Honestly, I think neither end of the spectrum is right. I should admit here that my own stance is significantly more nuanced that it was six months ago, but having now used various code assistant LLMs for a few months there are clearly useful contributions it makes to my work day. I think the elevator pitch would be something like “AI is a useful tool if treated like very smart autocomplete in tightly constrained environments”. I would even go so far as to say that one of the measures of “AI maturity” of a code base should be how tight the constraints are — unit tests, functional tests, static analysis, and so forth don’t go away with AI generated code — they are in fact more important than ever.
This is where the inexperienced developers will trip themselves up. If they’re not reviewing the generated code closely and course correcting the machine as required, then they’re likely to end up with a mess that isn’t performant or maintainable. The human still needs to know what “good” looks like. I think version control is key here too, because being able to walk backwards when you went down a dead end remains just as important as ever.
The thing is the current state of the art for code generation AIs is about as good as a junior developer. It requires just as much supervision, coaching, and prompting to think about the right things. However, just as you wouldn’t get a junior developer to mentor another junior developer, I am not sure that everyone has the skills required to adequately supervise the current state of the art in code generation.
Happy new year! I started the year by finishing most of our new automated release machinery: check out Release Machine. There’s not a lot to it, but it’s exactly what we need for the future: a signed push triggers release publishing using RubyGems.org trusted publishing (no more credential sharing!), configuration to allow specific sets of releasers per-gem (this is new!), automatic parsing of CHANGELOG.md files (using Andrew’s new changelog-parser gem, just in time!) creation of GitHub release entries, even publishing of releases to our forum (also new!), which will also come into our Discord thanks to our forum feed.
All that’s left is to sweep through the Dry gems to make sure their CHANGELOGs are in the right format, then bring them onto this new system. Then we need to continue to rollout of repo-sync to Hanami so they can use release-machine too. The good thing about these steps is that I can do them steadily and incrementally, so hopefully we see this thing finish in the coming month or so without it being a distraction to other work.
Beyond that, we have the team gearing up for a v2.0 of Dry CLI, which will include a small breaking change to our command classes (making out and err properly injectable via #initialize args), plus the remainder of the improvements Gustavo created for us, including styled text, shell completions, option type casting, and mutually exclusive options. It’s gonna be great. I’m aiming to take slightly more of a backseat with this release, but I’m hoping we can have this all ready to go by the end of January.
grvy raised a great question in chat about how to access the current user from within view exposures, so I turned that into an update to our guides (in twoplaces, since we’re still working on the new site).
Speaking of which, with this coming week being the first “real” one of the year, I need to get back onto my main task: getting our new site launched. I have half of our landing page copy left to refine, plus the copy for a few other pages to write.
The post resonates strongly with me — I think the idea that the expectations placed upon managers have changed in noticeable eras is true, but it also explains my own mixed feelings about the Silicon Valley of today. You see, as the industry became less passionate about treating engineers well and building things which genuinely improved the world over the last couple of years, I became less passionate about being treated poorly by my employers. It is definitely true that an employer is within their rights to let you know that you’re a replaceable asset of convenience, which I think is definitely a thing Cisco reinforced as often as possible, but the inverse is also true. If this employment thing is a purely commercial relationship, then it is simply rational for me to take a better offer if it comes along without feeling any guilt.
That is, I wonder if the industry will enjoy reaping what they are currently sowing when those better opportunities do inevitably come along?
As a follow up from my last post about my 8K TV [1] I tested out a Samsung 65″ QN900C Neo QLED 8K that’s on sale in JB Hifi. According to the JB employee I spoke to they are running out the last 8K TVs and have no plans to get more.
In my testing of that 8K TV YouTube had a 3840*2160 viewport which is better than the 1920*1080 of my Hisense TV. When running a web browser the codeshack page reported it as 1920*1080 with a 1.25* pixel density (presumably a configuration option) that gave a usable resolution of 1536*749.
The JB Hifi employee wouldn’t let me connect my own device via HDMI but said that it would work at 8K. I said “so if I buy it I can return it if it doesn’t do 8K HDMI?” and then he looked up the specs and found that it would only do 4K input on HDMI. It seems that actual 8K resolution might work on a Samsung streaming device but that’s not very useful particularly as there probably isn’t much 8K content on any streaming service.
Basically that Samsung allegedly 8K TV only works at 4K at best.
It seems to be impossible to buy an 8K TV or monitor in Australia that will actually display 8K content. ASUS has a 6K 32″ monitor with 6016*3384 resolution for $2016 [2]. When counting for inflation $2016 wouldn’t be the most expensive monitor I’ve ever bought and hopefully prices will continue to drop.
Rumour has it that there are 8K TVs available in China that actually take 8K input. Getting one to Australia might not be easy but it’s something that I will investigate.
I’ve had a Creality CFS upgrade for my K1 Max sitting on my workbench for probably a month waiting for me to install it. Part of that delay is that I knew it would take a while to install, and I am glad I waited.
I finally got around to doing that yesterday, and I have thoughts… First off, the Creality installation documentation is ok, but not great. I would have been in a lot more pain without Thinkering with Jerd’s tutorial video. So thank you to Jerd, whoever you may be. I also agree with Jerd that the extensive use of hot glue on connectors is super annoying. While I suspect it makes sense in terms of ensuring that devices work when they arrive to a customer, it makes upgrades super annoying and surely there is a better way. Overall I think I spent at least six hours on the install.
But what about multicolor printing now that I’ve finished the upgrade? Well, my initial observations align mostly with my expectations in that:
Printing is slow if there are color changes. This is especially true if you include the time taken by filament jams and failures to feed that I didn’t notice immediately. My first non-trivial test print will likely take 16 hours to complete. So, you can’t just set a complicated print going and go to work. It will be unlikely to have finished by the time you get home because it will jam somewhere along the way. This is especially annoying given that the way I’ve been clearing the jams is simply to press “retry”, which seems to resolve the issues. Could the machine not do that for me?
Multicolor printing is also a bit wasteful, even when using “flush to infill”. I was expecting this, but yeah its definitely a thing.
These are of course only true for multicolor prints, if you treat the thing as a loader which can deliver whatever single color you need for this specific print then its pretty cool out of the box.
To address the multicolor print issues, I have learned you should minimize the numbers of colors per layer to reduce time and waste, as well as ordering some unobtainium PFTE tube replacement which alleges it will jam less by being “more slippery”. We’ll see how that goes I suppose.
Finally, I was also surprised by how the filament change worked. I assumed that anything already loaded into the feeder tube would be flushed, and therefore tube length should be minimized. This is in fact not true, the filament gets retracted — the change of filament waste is driven by flushing the extruder and having to clean the extruder between colors. So, you can in fact have the CFS further away from the printer than I expected, although I had to consume PFTE tube to learn that. I do expect that the longer the feed tube the more filament failures to load, but then again the tighter the bends in the tube the more of those too, so there is a balance here somewhere.
What is you want to study computer science, but can’t afford university fees? Or, studied computer science a really long time ago and want to see what’s changed? Or just like learning stuff? Well, lots of schools now post their lectures on YouTube, so its entirely possible to construct a zero cost “self driving degree”, as long as you’re good enough at Canva to make your own certificate at the end. I consider this list incomplete, but in the end I decided I’d post the things from 2025 that I’d found and liked. I can always do an updated version later.
Introduction
First let’s nail down some basics with Crash Course Computer Science. This covers some fundamental stuff like binary maths, logic gates, as well as providing a good overview of the history of computing in general.
Databases
The focus here isn’t on SQL itself, there are lots of places to learn that. Instead, the focus here is how does a database actually work under the hood? That said, let’s chuck in a quick SQL introduction anyway.
freeCodeCamp.org’s SQL for web developers course I will admit I didn’t actually watch this one but I have recommended it to others who wanted to brush up on SQL. I made that recommendation on the basis of quite liking Lane Wagner’s golang course (included below).
CMU Introduction to Database Systems with Andy Pavlo (US fall semester 2024): Andy is an excellent teacher, quite funny at times, and good at explaining the underlying concepts in a way I found really engaging. Honestly, everything I’ve seen from Andy has been good content.
CMU Advanced Database Systems (US spring semester 2024): this one is aspirational for me. I intended to watch it in 2025, but simply ran out of time because of other learning tangents. 2026 for sure!
Operating Systems
Maybe Berkeley CS162, but it spends way too much time (8 lectures!) on mutual exclusion and locks. I haven’t actually finished this one yet.
Real world implementation:
Johannes 4GNU_linux’s linux device driver tutorial: these videos are really good in terms of content, but the quality of delivery is noticably better in the newer videos than the older ones. Luckily the creator appears to be re-recording many of them at the moment so that will improve over time. That said, be careful with the playlist because as it is re-created there are duplicate videos. If a topic appears to be covered twice, pick the newer video!
I am a bit shocked to discover that you can get through an entire Computer Science degree now without covering networking at all. That’s… concerning.
Jeremy’s IT Lab CCNA course despite not having the most exciting delivery ever, this course is well delivered. Whilst a CCNA might be perceived as vendor specific, the Associate level actually isn’t really. Sure, all the demos and labs use Cisco software, but the concepts are equally applicable to all implementations. That’s not quite as true for the more senior levels of Cisco certification however.
Android App Reverse Engineering is a live stream that Maddie Stone did during the COVID lock downs. Honestly, it feels to me like the audience and tooling didn’t treat her well during the stream, and as a result the video could do with an edit, but this is literally what went out live. This is the best content I’ve been able to find so far on how to actually reverse engineer an Android app while looking for malware. I have watched other courses, but they then to be very high level and vague, whereas this course has you actually reverse engineer actual applications. I just wish the live stream had been a better experience for Maddie. As a general thing, all of the talks I’ve seen from Maddie have been good.
Meeting opened at 20:08 AEDT by Joel and quorum was achieved. Minutes taken by Neill.
2. Log of correspondence
Council meeting minutes for 2024-Oct..2025-Sep – we need to provide copies of the minutes to the auditor
Re: Bills to Pay – DrupalCon Japan 2025 – Russell has responded
Clarifying and confirming contact details – Jack Skinner has shared contact details with the council
Spreadsheet shared with you: ‘DrupalSouth operating budget – Wellington 2026’ – Julia Topliss has shared the Drupal Down Under budget with the council
Linux Australia Inactive Member – we have received a request to deactivate a membership. Neill has responded (and cancelled the membership)
LA meeting notes – email to the EO subcommittee reporting on discussions at the LA subcommittee meeting.
Loose ends – discussion between Russell and Jack covering Wise, Stripe, Paypal, password managers and international tax on ticket sales.
[LACTTE] Request for insurance cover for a Perth Linux User Group event – request for insurance cover. Council passed a motion on email and provided insurance for the event.
Fwd: DDD Melbourne 2026 Community Space Applications – Joel has told Sae Ra that LA would like to participate.
Stuff we need from Linux Australia – email from EO. Joel and Sae Ra have responded
Must-See Recordings Now Available – newsletter from OSI
Get your T-shirt size in! – EO internal email
We need your T-shirt size for the EO conference. – EO internal email
Transport and accommodation options draft. – EO internal email
Lodge Linux Australia Activity Statement July..September 2025 – Russell has requested LA’s accountant lodge the latest BAS. The accountant has confirmed lodgment
Re: DrupalSouth Community Day 2025 – wrapping up accounts stuff – discussion between Julia and Russell about paying the remaining invoices for the Drupal South Community Day
Fwd: Flounder post from admin@lists.linux.org.au requires approval – enquiry from Russell Coker about progress in blocking spam from lists.linux.org.au – We need to respond saying that changes have been made and we are monitoring the situation
Fwd: Public Liability Insurance – request from KiwiPyCon for the current insurance certificate. Joel has supplied it.
LA Council Vote: Accept PLUG Installfest as event – on list motion to accept event. Motion was passed.
Keynote #1 and early bird “hurry up” notice for Monday – internal EO email
Authorisation for Kiwi PyCon Transaction Exceeding $5000 2025-11-15 – payment was authorised.
2019-2021 PO BOX paperwork – Julien Goodwin asks if he can dispose of paperwork from 2019-2021
Tuesday night’s meeting – internal EO email
On-campus accommodation for core team – internal EO email
Subsidised parking – internal EO email
Please register on the EO2026 conference website
Link to join meeting – request from Sophie for Linux Victoria for a link to tonight’s meeting. Joel has responded.
DrupalCon Nara Update – Report from Michael Richardson
3. Items for discussion
PO Box items with Julien Everything that was there has been scanned, so we should have copies of everything. We should not need physical copies, so it should be fine for Julien to dispose of them. We will ask for Julien to shred them. Joel will contact Julien.
Stripe
After a discussion between Jack and Russell about Stripe’s tax notifications, it seems reasonable to argue that we shouldn’t have to pay VAT/GST in other countries for services being consumed in Australia.
Russell: Our Accountant didn’t give us firm advice one way or the other when I asked him.
Russell: If someone was paying us to watch the live stream while they are in another country it might be different. Therefore we can’t sell digital tickets to overseas attendees.
Russell feels that some of this may be an attempt by Stripe to sell us a service.
Elena suggests that we should just say that digital tickets are sold under Australian conditions.
We will develop a written policy on this and publish and make subcommittees aware of it. Elena will write the first version.
4. Items for noting
On list motion MOTION: That Linux Australia will provide assistance, in the form of insurance cover, to PLUG for their Installfest November 2025. Moved by Joel, Seconded by Jonathan Motion passed.
DrupalCon Japan DrupalCon Nara was a big success. 211 ticket sales in the end, and financials look to be about where we expected which is good. Feedback has been extremely positive.
Michael has informed the Steering Committee that he is going to leave the Committee, and they are evaluating their options for the next DrupalCon, probably in India, probably in 2027.
5. Other business
Annual Report
Elena has started work on the annual report. There is a document in the shared drive. Elena will produce a plan for chasing down the required information this week.
Membership applications – Still behind on processing these – have drafted an email to send to applicants to ask about their connection to the community.
I am writing to you about your recent application to join Linux Australia._
We would like to confirm that you would still like to join and if so ask if you could outline how you currently participate in our community.
We aim to represent and assist the groups and individuals who make up the Free Software, Open Source and Open Technology communities in Australia.
We support a number of conferences in Australia, New Zealand/Aotearoa and the Asia Pacific region, including Everything Open, Drupal Down Under, Pycon Au and Kiwi Pycon
Can you tell us how you heard of us, and briefly describe your participation in the communities we represent? There are no specific requirements for membership, but to reduce the number of spam applications, we are confirming applications are genuine before we approve them.
6. In camera
No items were discussed in camera
7. Action items
Neill – publish minutes by 22/11
Neill – respond to Russell Coker
Elena – write a policy for sale of tickets to international attendees
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 20:48 AEDT
Next meeting is scheduled for 2025-12-03 and is a subcommittee meeting
Meeting opened at 20:07 AEDT by Jennifer and quorum was achieved at 20:13 AEDT Minutes taken by Neill.
2. Log of correspondence
Richard Shea expenses – this has been resolved by deleting these items from Xero
Fwd: Bank account updated for Pycon AU 2026 – Jack Skinner asks if Russell had updated the bank account for PyConAu in Stripe
[LACTTE] Request for insurance cover for a Perth Linux User Group event – Harry McNally is helping to run another event that needs insurance
Socials posts – list of social media posts for EO2026
Fwd: Re: [LACTTE] BAS Time – Update from LUV – Russell is happy with progress
Amazon Web Services Billing Statement Available [Account: 103334912252]
Fwd: DDD Melbourne 2026 Community Space Applications – Sae Ra asks if LA would like to participate in DDD Melbourne in Feb 26
Lodge Linux Australia Activity Statement July..September 2025 – Authorisation from Russell LA’s tax agent to lodge our BAS
Everything Open 2026 Meeting tomorrow night, Tues, 4-Nov, 8pm – Agenda for the EO Subcommittee meeting
PyCon AU 2026 Update for 5 November 2025 – see below for details
Activity statements are available online [SEC=OFFICIAL:Sensitive] – notification from the ATO
Updated agenda for tonight at 8 – update to EO meeting agenda
3. Items for discussion
Admin Team UpdateÂ
Investigated the complaint from Russell C about forged emails. Have changed a config option and we are now blocking even more email. Hopefully fixed.
Currently in a change freeze for EO which will remain in place until after the election.
Steve will try and talk to a FastMail rep while he’s at the IETF meeting in Montreal.
It is time to start planning to replace the current server hardware. It has about two years left on the original planned lifetime.
Kiwi Pycon Subcommittee UpdateÂ
Conference is happening this month
Sponsorships are not great which is a major impact on the financial position of the conference.
NZ economic situation is generally poor and companies are not looking to recruit.
Ticket sales are up
The current projected loss of $16k can hopefully be reduced by more ticket sales.
The projected loss was $30k a month ago.
Targeting Kawaicon for more ticket sales.Â
Numbers are ahead of Invercargill now.
Planning for 2026 is starting but there is little appetite for running another conference at a loss.
Drupal Subcommittee UpdateÂ
No update [Neill check with Julia re invitation]
DrupalCon Asia
$63,946 in ticket sales of $66,192 goal (up $29k from Oct)
171 tickets sold, expecting around 190-200 total
$147,760 sponsorships of $167,000 goal (up $41k from Oct)
Treasure Hunt is underperforming on attendanceÂ
Expecting around a 10k net profit after DA and LA share
There is still some fat in our expense estimates (~10k)
Seeking guidance on willingness to support DrupalCon India in 2027
DrupalCon India would be a significantly different event – much smaller budget
Financial regulation in India may be a factor.
The DrupalCon Asia is about to reshuffle positions
Joomla Subcommittee UpdateÂ
Working on website updates. Ongoing.Â
The membership/subscription system is nearly ready. Will send through examples for council to look at.
PyCon AU Subcommittee UpdateÂ
2025
No further money expected either in or out.
Everything pretty much finished.
Final reports to write and then 2025 will be done.
Looking at a $1200 loss after the LA tax.
2026
A few “looking forward” points as an async update:
We intend to publish a conference roadmap “soon” outlining our call for tracks, CFP, volunteers and other key milestones to help set expectations given our unusually long planning cycle. It would be great if this can be amplified by LA.
We’re on track to have the full new website live by KiwiPyCon. Very excited by this – it’s looking awesome.
We would like to announce (and announce) longer call for specialist tracks at the same time (in time for Kiwi PyCon). It will be open through to early next year (TBA if it will overlap w/ Everything Open).
Our team has grown, and will grow further. We welcome John Band (LI) to drive marketing and communications, Rhydwyn McGuire (LI, formerly PyCon AU 2025) as Program Chair, and Izy Hogan (formerly PyCon AU 2025) as Volunteer Lead. We have active conversations with future core team members to drive sponsorships & commercial sales, program co-chair, and community & culture focus areas.
I’ve also started a new informal essay series titled “From the directors desk” where I share a little more transparently how we approach organising PyCon AU 2026 behind the scenes. The first post islive, and I’d welcome feedback – primarily in the shape of future topic suggestions
Lastly: I’m aware our plan to update ‘actuals’ in the approved/shared budget has been delayed. Nic will be back on deck ‘soon’ . The delay is primarily optimising for getting it right as a team, for the long term, and in a repeatable way aligned to Xero actuals. As an interim reassurance: we reconcile Xero weekly, and I’m tracking closely any budgeted spend this year (narrator: minimal to no new spend expected) matched closely against actuals in Xero. I’ll raise to discuss on Wednesday night.
Proposed Agenda for our 5 minutes on Wednesday:
I’ll highlight our wins + asks (slide attached).
Questions WRT minuting Nucleus retroactively.
Budget actuals.
Questions from council
WordPress Subcommittee Update
Looking at possibly running a couple of events next year
– possibly something the week before Drupal South in Wellington
 Brisbane September next year
Looking for more organisers.Â
Vladimir and Janna would like to have a less central role.
[Council]:We would need to clear a New Zealand event with WordCamp central, our current agreement is only for Australia
Everything Open Subcommittee UpdateÂ
Two keynotes have been announced and the rest are scheduled to be announced
There is a current commitment of $33k. One potential sponsor has asked for an attendee list which is not something that Linux Australia has done in the past. [Jack]: Suggests that a conversation be had with potential sponsors to tease out what the sponsors goal for the list is and suggest alternative ways of generating leads.
In addition Open SI are providing very significant in kind sponsorship which needs to be recognised. Name them as a “venue sponsor�?
EO need access to symposion to be able to audit ticket sales and process refunds.
Catering will now include lunch because there is enough sponsorship.
Currently looking at a surplus of $9,600
T-shirt designs are in progress
Choice of T-shirt print vendor is up to the conference.
Arjen is interested in providing a psychologist to attendees, LA has previously provided grants for this, but is unlikely to do so this year. LA has not been able to run a grants program due to a lack of funds.
Sessions have been chosen – need to prepare to publish the schedule to the website.Â
How many lanyards does EO2026 need to buy?
Videos from 2025 should be ready before the 2026 conference.
Carlos has spoken to the Unilodge staff and accommodation can be offered to attendees. Unilodge will manage the bookings. Some hotels have offered discounts for attendees.
Flounder Subcommittee UpdateÂ
No meeting since the last report.
A lot of activity on matrix
Russell Coker plans to hold more events soon.
LUV Subcommittee UpdateÂ
Andrew is now an OCM
Nothing to report yet, but the president and treasurer seem to be managing things well.
4. Items for noting
We need to improve onboarding of subcommittees, especially expectations around meeting times and that they shift for daylight savings time.
5. Other business
We still need to send a request for bids for EO2027
6. In camera
Two items were discussed in camera
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 21:43
Next meeting is scheduled for 2025-11-19 at 20:00 AEDT (UT+11:00)
Meeting opened at 19:35 AEDT by Jennifer and quorum was achieved at 19:56. Minutes taken by Neill.
2. Log of correspondence
Subcommittee update for PyCon AU 2026 – 22 September 2026 – Russell has responded to questions about possible future charity status. In summary we are exploring the possibility but there are no concrete plans
Drupal Community 2025 bank and stripe accounts – Russell has responded clarifying which accounts Drupal subcommittee has access to
Re: FW: Finalising catering contract – Discussion about EO2026 catering. Council CCd for information
Deep Dive: Data Governance – mass email from OSI
PyCon AU 2026: Request for payment + checking permissions to approve payments – Discussion about PyCon AU 2026 payments and ID requirements for Jack Skinner
Re: sponsorship prospectus for 2026 edition – EO offering sponsorship opportunity to Canonical
DrupalSouth Wellington budget – query as to progress on council considering the Drupal South Budget – Jonathan has responded
Westpac 100 points ID confirmed for Jack Skinner – Completion of authorisation requirements at Westpac for Jack Skinner
Science-y keynote options – Discussion about keynotes for EO2026
Annual Report season – offer of help with the annual report from Sae Ra – Jonathan has responded
Voting on DrupalSouth Wellington budget – on list motion to approve the Drupal South budget
Amazon Web Services Billing Statement Available [Account: 103334912252]
Fwd: Lodge your NFP self-review return [SEC = OFFICIAL] – NFP self review is due by 31 Oct 2025
[EO2026] Rooms – discussion about the size and number of rooms available for EO2026
Re: Post induction setup – question from Miles about previous transaction in the accounts allocated to EO2026 – Russell has responded
Request for calendar invite for LAC for PyCon AU 2026 – request to add invitees to the PyConAU Subcommittee meeting invitation. Neill has actioned and responded
Google sponsorship purchase order – Request for access to LA’s Ariba login. Russell has responded.
Tonight’s meeting – agenda for EO2026 meeting of 7 OctÂ
subcommittee meeting 8th October – request from Steve for an earlier time. Jonathan and Neill have responded
PyCon AU 2026 Update for 8 October 2025 – update from the PyCon AU 2026 subcommittee. Included in full below.
Time of Council meeting tomorrow (Wed 8 Oct) – query from Jonathan about DST meeting start
Recap of tonight’s meeting – summary of the EO2026 meeting of 7 OctÂ
Invitation to LA committee meeting – Request from Alexar for an invite to the subcommittee meeting. Neill has actioned.
3. Items for discussion
Admin Team UpdateÂ
Not a lot happening, prepping for end of year
FastMail invoice has arrived but is confusing. Steve will chase.
Everything else is ticking along.Â
Looking to restart the ubuntu mirror after talking to Steven Hanley. SOme work to do before Canonical will recognise the mirror as a primary
Report form ANU network team – there are some weird issues to do with access to the Debian security archive which is not managed by LA but does use IP addresses that belong to us
Joel and Steve will reach out to Fastmail to clarify EO’s request for monetary sponsorship.
Steve will be in Montreal for the next meeting but should be able to join.
Kiwi Pycon Subcommittee UpdateÂ
Ticket sales are a little less than half of the planned numbers. The good news is that more contributor and workshop tickets than expected have been sold.
Currently struggling with sponsorship. Talking to previous sponsors, but few have signed on so far. Currently looking like $30k below the worst case budget. Looking at what can be done to turn that around.
The schedule has just been announced which will help with ticket sales.
The organising committee is looking for ways to save money from the budget.
Catering is not yet locked in.
Drupal Subcommittee UpdateÂ
Need a decision about the budget ASAP
Income has exceeded projected costs
Russell has agreed that the Drupal SubCommittee can use profit from the Drupal Community Day as a donation to Drupal South as long as it is properly recorded in the budgets.
Next year the plan is to focus on students.
Julia, Simon and Jack will have a conversation about marketing.
DrupalCon Asia
Hello Councillors,Â
I’m sorry to say I can’t make it to the meeting tonight.Â
I have however updated ourusual slides with the latest information. We’re falling a bit behind on sponsorship and ticket sales. We still have some strong sponsors in the pipeline, but they’re getting harder to pin down.Â
The treasure hunt has gone live and you cansee more here. We’ve turned this into a paid event with a small fee, and also opened it up to non-Con attendees, provided one person on the team has a Conference ticket.Â
We’re working hard to try and secure some final sponsors. We know from last year that most of our registrations will come at the very end: For Singapore, we went from 130 to 250 ticket holders in the last 6 weeks so hopefully we will see a repeat here.Â
Apologies again I can’t make it tonight!Â
Mike
Joomla Subcommittee UpdateÂ
Hi all,
I’ve had bit going on outside work these last few weeks and don’t have much of an update for this evening, so sending my apologies.
As far as updates go, Terry the other account signatory has completed his induction etc, so the bank account and Stripe are set up, but we’re yet to spend any money
Work is continuing on the Joomla AU website revamp and we should have the memberships live next week.
I’ll be at next month’s meeting in person to provide an update.
Cheers
Nathan
PyCon AU Subcommittee UpdateÂ
Still waiting on payment from a few last minute sponsors.
Good attendance with excellent diversity.
Currently looking at roughly $5k loss.
Attached is an update ahead of tomorrow night’s LAC meeting.
A copy is providedhere, and all updates & docs for Linux Australia remainhere (good for a bookmark ). [Links removed in published minutes]
While the update is looking backwards, I wanted to share a few “looking forward” points:
We’re aiming to finish assembling the core team by month end (October)
We’re intending to have a published “Call for track organisers” in time for Kiwi PyCon at the end of November
We’re intending to have an updated ‘full’ 2026 website in time for that call for tracks.
Additionally (admin, primarily for Russell’s benefit):
We’re aware of an unreconciled PayPal transaction in Xero; Pete has forwarded from Russell’s email. We’ll get that reconciled in due course
We’re mindful that the currently shared budget spreadsheet does not have actuals yet. Thats our top priority to be on top of now.
We’re in the middle of updating a/the budget spreadsheet with a clearer feed of actuals. Our budget will then show actuals (feed of costs already incurred, sourced from Xero) + projections (payment milestones, contracted commitments etc). Changes will be made additively of course, but we’re working on it in a copy for now (just in case). I’ve exported the budget (as approved by LA) to PDF for historical purposes too.
We noticed a miscalculation in Fixed Costs that did not include the Financial Assitance grants in the total costs. ‘Fixed Costs’ tab has been updated to amend this (specifically the sums in I24, J24, I58, J58) . Total impact is ~$5k which does not affect our P&L position in either budget scenario and is something for us to navigate in April next year pending sponsorships.
Speak tomorrow!
Cheers,
Jack
Jack and Russell will discuss the tax situation with Stripe payments
Jack has expressed some concerns about PayPal’s reputation for freezing funds.
WordPress Subcommittee Update
No update received
Everything Open Subcommittee UpdateÂ
SSC has accepted 30 talks and 9 tutorials with 14 backups,plus 5 keynotes
Sponsorship is progressing, but it’s slow going.
Catering contract has been signed
Call for volunteers closes on Friday. Look to have sufficient volunteers
First keynote announcement planned for Monday 13/10
Early bird ticket sales close on 17/10
Alexar may have some sponsorship leads
Flounder Subcommittee UpdateÂ
No update received
LUV Subcommittee UpdateÂ
Recent activities:
Software Freedom Day
Linux Regional Summit
Visited the Ballarat Tech School
Planned Events:
Linux Town Hall meeting with a focus on the end of Windows 10 support
Started a series of weekly events focussing on Linux and AI
Encouraging user groups in the APAC region to attend LUV events
Alexar has questions about insurance – has asked for a copy of the insurance papers so that he knows what cover is available for LUV events.
Team members:
Alexar is the public face of LUV.Â
Sophie is treasurer and secretaryÂ
Terry is regional contactÂ
Simon helping with workshops, particularly cyber security
Looking at setting up the infrastructure to support future activities
Currently going through a rebranding exercise to become Linux Victoria
4. Items for noting
Jenny has spoken to Sae Ra about the Annual Report. At this stage Jenny and Elena have it in hand, but they will discuss it with Sae Ra and ask for help if needed.
5. Other business
Meeting Times
Joel has proposed moving the meeting back by 30 minutes 8:00 PM AEDT
Voting on DrupalSouth Wellington budget
The on list motion passed. The budget is approved.
The motion was: ON LIST MOTION: Linux Australia accepts the DrupalSouth Wellington 2025 organising committee as a subcommittee of Linux Australia, and authorises the expenditure on the budget submitted with their proposal.
Proposer: Russell Stuart
Seconded: Jonathan
Motion Passed by a majority of the council.
6. In camera
One item was discussed in cameraÂ
7. Action items
Jonathan will let Julia know about Council’s approval of the Drupal South budget.
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 21:12
Next meeting is scheduled for 2025-10-22 at 20:00 AEDT (UT+11:00)
Meeting opened at 19:32 AEDT by Joel and quorum was achieved. Minutes taken by Jonathan.
2. Log of correspondence
Re: Sorting out Xero and bank access – arranging financial inductions for subcommittees
Fwd: Re: [LACTTE] Proposal for PyCon AU 2026 – Inductions and stripe invites
Hello from Python Australia! Potential new sub-committee – looking positive. Next step is a formal request to form a subcommittee
Keynotes status – discussion about EO2026 keynotes
PyCon AU 2026 contracts + deposits for payment please – request from Jack Skinner for payment of invoices and approval of contract for PyConAU 2026
Extending the EO submissions deadline. – Miles informing the council of the intention to extend the EO2026 CfS
Fwd: Update for September – new auditor contact details
PyCon AU 2026 subcommittee financial induction details – discussion about PyConAU2026 Xero/Westpac/Stripe access
Amazon Web Services Billing Statement Available [Account: 103334912252]
EO CfS – request from Neill for contact details for Donna Benjamin to pass to Miles Goodhew
Re: Activating pay via credit card for DrupalSouth events – Russell has responded to Julia Topliss’ query
Westpac Transfer Authorisation – zeroing PyconAU26’s bank account.
Submissions deadline extension – EO call for sessions extended
You’ve Made A Sale – 75085873 (AU$4.32) – from Redbubble
Submissions deadline extension emails sent – notification from Miles that the EO2026 CfS has been extended. Joel has responded
Manufacturing Statement – 75085873 – from Redbubble
Fwd: EO2025 videos (even just for speakers)? – Query from Miles about how to respond to a question about 2025 videos. Joel has followed up.
DrupalCon Nara Treasure Hunt – Next Steps – Request for approval to pay for insurance from Michael Richardson
DrupalSouth Wellington budget for your review:
[Link removed]
Estate of the Late Mark Stanley Willis: a further amount has been received. An example of Mark Willis’s connection with the Australian Open Source community:
Python Australia: clarification sought about fund disbursement when an independent subcommittee separates out, and the charitable purposes in the constitution. Jonathan responded, and Russell confirmed during the meeting.
3. Items for discussion
Kiwi Pycon Subcommittee Update (7:35 PM)
No update received.
Drupal Subcommittee Update (7:40 PM)
A report was provided:
The Community Day is going well. 6 sponsors have signed contracts worth $13k, which exceeds income expectations. 35 tickets of the 80 budgeted have been sold. Venue can go to 120. There is lots of energy and enthusiasm.
Community Day 2024: final sponsor payment from GovCMS has been sorted.
Wellington DS conference budget has been submitted for review and approval:
The subcommittee would like feedback within the next couple of weeks.
The budget includes several attendance levels. Joel: which one is most likely in the view of the subcommittee. There is concern about the climate, with a lot of companies choosing not to sponsor as a result. Due to this uncertainty, they are working to the lower level (120). They note that overall venue costs are much lower than in the past and there’s more flexibility (for example, no hard minimum number commitment is required). There has been a good response from Wellington people. The subcommittee also has good relationships with tertiary and training organisations and hope to use this to encourage students to attend. Sponsors like it when students attend. Due to venue flexibility and the likelihood of higher student numbers, there could be a more hands-on classroom vibe.
To counter the uncertainties the committee is aiming to be bold and try a few things.
DrupalSouth 2027 is likely to be back in Australia. Discussions are ongoing to finalise location.
DrupalCon Asia
September report provided:
Sponsorship: looking good at this stage.
Treasurehunt is going ahead, to be announced next week.
No sponsor has stepped forward for videoing. Instead, a crowdfunding effort is being undertaken for video recording by the Drupal Recording Initiative. USD1500 is needed for this, which seems very reasonable.
There is a general sluggishness with sponsors, which is typical of the times given the market downturn. There is one from Australia, none from New Zealand and none from India.
In general, things are looking good.
The conference is in 2 months.
Joomla Subcommittee Update (7:45 PM)
Terry has completed induction. Russell needs some details from Terry to set up bank account and Stripe access. The bank and Stripe accounts are in place. Russell to follow up.
There has been a working bee on the website to work on membership functionality and a business directory. There are people working on the site design.
It is intended to go live with memberships next week.
An in-person event is planned for 2026 (a fully branded Joomla Day).
Admin Team Update (7:50 PM)
The Fastmail billing situation is nearly sorted. Confirmation will come in the next billing cycle.
Need to work out appropriate ways to acknowledge Fastmail’s sponsorship of LA email..
There will not be any face-to-face admin team meeting in 2025 due to schedules. In 2026, preliminary plans are being drawn up to have one in the first half of the year.
PyCon AU Subcommittee Update (7:55 PM)
PyCon AU is this weekend.
At present they are projecting a loss of $6k (after payment of LA levy).
There might be one last minute sponsor for around $2k
Ticket sales have continued much as they were, albeit with a slight upward trend. This gets attendance to around 430 Friday, 380 on Saturday and Sunday. It’s in the ballpark of the lower end estimates in the budget.
Jack is travelling, so there is no 2026 update tonight.
WordPress Subcommittee Update (8:00 PM)
WordCamp update: nothing to report this month.
Currently investigating 2026 in-person event prospects: NZ (in May) and Australia (Brisbane sometime between June and September).
Everything Open Subcommittee Update (8:05 PM)
Keynotes are sorted.
Have 4 sponsorship commitments.
The catering contract is being worked on. Hoping to finalise it this week. The plan is to start with simple and add things in as sponsorship money comes in.
There have been some publicity difficulties particularly around the Call For Sessions. This should be in hand now.
Submissions are a little light. There is hope for more following another round of communication..
Next key date: Call For Sessions close on 21 Sep 2025.
Organising team meeting frequency is likely to increase as the conference approaches.
For noting: Miles will be offline 24 September -2 October.
The date for opening ticket sales was discussed. It was suggested to open as soon as possible rather than mid-October to give a longer run-in. The new aim is for 15 or 16 September. Miles will pursue this. Once ticket prices and inclusions are confirmed, sales can be opened quickly because the website is in place.
Russell to allocate a Stripe account and Bank Account for EO2026.
Flounder Subcommittee Update (8:05 PM)
No update received.
LUV Subcommittee Update (8:05 PM)
No update received.
There has been no response received to Russell’s email requesting a copy of the minutes showing committee approval for the EO2025 ticket purchase.
It has been noted that LUV have not held committee elections for several years, required communication with Linux Australia has not been happening despite numerous requests and prompts, and a number of actions have been taken unilaterally.
In response, a motion was raised by Joel:
That Linux Australia suspend LUV’s access to its bank account. Seconded: Russell Passed unanimously
4. Items for noting
Russell needs personal information to set up access for EO2026 bank accounts. Russell will follow up.
5. Other business
None
6. In camera
One item was discussed in camera
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 20:52
Next meeting is scheduled for 2025-09-24 at 19:30 AEST (UT+10:00)
Meeting opened at 19:40 AEDT by Russell and quorum was achieved. Minutes taken by Neill.
2. Log of correspondence
Proposal for PyCon AU 2026 – was approved over email, and Joel has informed Jack
Everything Open Canberra 2026 status – Joel has responded
Bank letter required for sponsors – We now have two bank letters
Motion to adopt the proposed conflict of interest policy – Was superseded.
Amazon Web Services Billing Statement Available [Account: 103334912252]
PyConAU2026/Python community Australia
Linux Australia <> Snowflake procurement via PyCon AU
Linux Australia appointment Indooroopilly 12 Noon 16/Aug/2025 – Neill will check with Jenny to confirm the necessary letter has been received. Follow up with Australia Post if necessary.
Fwd: MHW Announcement (we are getting new auditors)
Motion to adopt the proposed conflict of interest policy
Meeting Reminder (actually a request to transfer money to the WordCamp Steering Committee) – Jonathan has responded
We’re Updating Our Account Fees – Redbubble are updating their account fees
Bringing a guest to Drupal Subcommittee – Jonathan has responded
Fwd: Welcome to United Advisory – email from new auditors
Everything Open Canberra 2026 status – various questions from EO2026 committee. Joel has responded
Vote: Accept PyCon AU 2026 Subcommittee – motion to approve PyCon Au 2026 passed
Request for insurance cover for a Perth Linux User Group event – Joel has responded
Apologies I won’t make tonight’s meeting. (From Julia Topliss) – Joel has responded
DrupalCon Nara Treasure Hunt – Next Steps
Minutes from tonight’s informal meeting
Feature freeze: a pause with purpose – From OSI
Apologies/correction Re: Everything Open announcement.
Invitation to Open Source Congress – From OSI
Fwd: Re: [LACTTE] Proposal for PyCon AU 2026
Processes for onbording (sic) or signing/deposits for suppliers to PyCon AU 2026
WordPress Steering Committee – Joel has responded
Canva (query as to whether LA has access to Canva) – Joel has responded
Domain due to expire: linux.au Expires in 30 days. #tid26266367
[One email dealt with in camera]
Fwd: Regional summit Morwell 25th August
Linux Australia Inactive Member
Fwd: Your term deposit has been renewed
Re: Sorting out Xero and bank access
Linux Meeting and Joomla Update – eventually turned into a discussion about financial inductions
Keynotes status (EO2026) – Discussion about possible keynotes for EO2026
Hello from Python Australia! Potential new sub-committee
3. Items for discussion
MOTION: Linux Australia adopts the conflict of policy on github, SHA 195953b3c123370cf4fcf7273c2d804669b3afb0. The policy is viewable at: https://github.com/linuxaustralia/constitution_and_policies/blob/195953b3c123370cf4fcf7273c2d804669b3afb0/conflict-of-interest-policy.md
PROPOSER: Russell Stuart
Seconded: Jonathan Result: Passed
Python Australia! Potential new sub-committee
The proposal looks good to those of us present at this meeting, but we will defer final discussion to a meeting where more council members are available.
We will need a more formal proposal to vote on.
4. Items for noting
.
5. Other business
6. In camera
Two items were discussed in camera.
7. Action items
7.1 Completed Items
7.2 Carried Forward
Meeting closed at 20:17 AEST
Next meeting is scheduled for 2025-09-10 and is a subcommittee meeting
With at least three disciplines of interest - energy and climatology, public economics, and high-performance computing - there is the issue of whether current trends in artificial intelligence are environmentally sustainable. The following is a basic sketch of electricity usage, needs, and costings.
The promise of artificial intelligence is as old as computing itself, and, in some ways, it is difficult to distinguish from computing in general. As the old joke goes, it is no contest for real stupidity, and an issue that became all too evident to Charles Babbage when he developed the idea of a programmable computer:
On two occasions I have been asked, - "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
-- Passages from the Life of a Philosopher (1864), ch. 5 "Difference Engine No. 1"
Of course, we now have computational devices that can attempt to solve problems with incorrect inputs and perhaps provide a correct answer. That, if anything, is what distinguishes classical computation from contemporary artificial intelligence. Time does not permit a thorough exploration of the rise and fall of several attempts to implement AI; however, the most recent version of the last decade, which involves the application of transformer deep learning and the use of Graphics Processing Units, continues to attract investment and interest.
Transformer architectures for artificial neural networks are a fascinating topic in their own right; attention (pun intended) is directed to the use of GPUs as the main issue. Whilst the physical architecture of GPUs makes them particularly suitable for graphics processing, it was also realised that they could be used for a variety of vector processing, providing massive data parallelism, i.e., "general purpose (computing on) graphics processing units", GPGPUs. However, physics gets in the way of the pure mathematical potential of GPUs; they generate a significant amount of heat and require substantial electricity, and that's where the environmental question arises.
The current global electricity consumption by data centres is approximately 1.5%, according to the International Energy Agency. However, that is expected to reach 3.0% by 2030, primarily due to the growth of AI, which would include not just the GPUs themselves, but also the proportional contributions by CPU hosts, cooling, transport, installation, and so forth. A doubling of energy consumption over a period of a few years (from c400TWh in 2024 to c900TWh in 2030) is very significant and, if the estimates prove to be even roughly correct, then further energy utilisation needs to be considered as an ongoing trajectory for at least another two decades until it becomes ubiquitous.
There are essentially two ways of managing energy used in production with an environmental perspective, given a particular policy. One approach is high-energy and high-production, concentrating on renewables or non-GHG energy sources. The other is a reduced-energy, high-efficiency approach that concentrates on better outcomes, "doing more with less". More important than either of these, in my opinion, is the incorporation of externalised costs into the internal price of an energy source. One graphic example of this is the deaths per Terawatt-hour by energy source. Solar, for example, is more than three orders of magnitude safer than coal.
To satisfy existing and expected demand, the AI industry is, in part, turning to nuclear for its energy needs. Data centres tend to be located within population centres, partially due to latency reasons, whereas renewables like wind and solar require a significant amount of land area. Additionally, where existing nuclear power plants and infrastructure are already in place, it is relatively inexpensive, even compared to battery technologies. Nuclear provides sustained power generation not just throughout the day, but across months and seasons. With approximately 5% of generation lost in transmission, nearby power sources are more efficient.
The main weakness of nuclear power is the time and cost associated with the construction of new plants, and in this regard, the big data centre and technology groups are taking a gamble. They assume that there will be sufficient demand for AI and that they can generate enough income over the next decade to cover the costs. Whilst they are very likely to be correct in this assessment, and certainly the choice of nuclear is preferable to the fossil fuel sources that are currently driving most data centres (e.g., methane gas in the United States, coal in China).
As for demand-side considerations, these can include the energy efficiency of the data centres themselves, the way models are designed, and the way AI is utilised. Cooling is an especially interesting case; as mentioned, GPUs run quite hot, and to avoid catastrophic failure, they require effective cooling. This is usually done with evaporative cooling, which means significant water loss, or by chillers, which doesn't mean much loss, but a huge amount of water for cooling. A third option is dielectric liquids, such as mineral oil, which results in a data centre that is quiet and at room temperature, while servers operate at an optimal temperature. The main disadvantage is the messy and time-consuming procedures for upgrading system units.
The model design also presents some opportunities for improvement. The typical approach is to train neural networks with large quantities of data; however, the more indiscriminate the data collection is, the greater the possibility of conventional error. As some critics suggest, an LLM is essentially a language interface that sits in front of a search engine. A smaller but more accurate collection of data can be more accurate, as well as being less resource-intensive to train in the first place. A number of smaller models can operate with connective software for matters outside of the initial module's scope instead of a monolithic approach.
Finally, there is the matter of what AIs are being used for. Certainly, there are some powerful and important success stories such as the key designers behind AlphaFold winning the 2024 Nobel Prize in Chemsitry for protein structure prediction and, as many contemporary workers (especially in computer science) are all too aware, the ability of AIs to produce code is quite good, assuming th developer knows how to structure the questions with care and engages in thorough testing. Additionally, the increasing application of these technologies in robotics and autonomous vehicles is disconcerting, as illustrated by the predictive and plausible video "Slaughterbots".
On the consumer level, an AI can perform tasks which a human is less efficient at. So rather than simply asking "how many tonnes of a GHG does AI cause", a net emissions question should be asked, appending "... compared to human activity", that is, productivity substitution. However, with effectiveness comes the lure of convenience, as it attempts to extend the use of AI to everything, even when human energy usage would be less than that of an AI-mediated task. Ultimately, it is the combination of human failings, a combination of laziness (always choosing convenience), wilful ignorance (not knowing and caring about energy efficiency), distractibility (extending AI for trivialities rather than tasks of importance), and powerlust (commercial or political), that present a continuing challenge to the prospect of implementing an environmentally-sustainable development of artificial intelligence.
On the evening of September 3, 2025, there was suddenly a strong and remarkably unpleasant flat metallic odour in our master bedroom and ensuite. We opened all the doors and windows and got the fan on, and eventually it went away. The smell didn’t come back after closing up, but the next morning when I was inspecting the Redflow ZCell battery in the crawl space under the house I discovered a 6cm long crack towards the right hand side of the back of the electrode stack, about 2.5cm above the base. A small amount of clear liquid was leaking out.
Ah… That’s not good.
Due to the way our house is constructed on a hill, the master bedroom, walk in robe and ensuite on the lower floor share some airflow with the crawl space under the main floor of the house. For example, the fan in the ensuite vents to that space, and I’ve felt a breeze from an unfinished window frame in the walk in robe, which must be coming from the space under the house. I’ve since siliconed that window frame up, but the point is, the electrode stack split, the battery was leaking, and we could smell a toxic fume in the bedroom.
Readers of the previous post in this series will be aware that Redflow went bust in August 2024, and that our current battery was purchased from post-liquidation stock to replace the previous one which had also failed due to a leak in the electrode stack. When the new unit was commissioned in March 2025 I applied some configuration tweaks in an attempt to ensure the longest lifespan possible, but it lasted slightly less than six months before this failure. Not great for something that was originally sold with a ten year warranty.
Under the circumstances I wasn’t willing to try to procure yet another new ZCell to replace this one, but given we’d found the leak very early this time, I figured I had nothing to lose by trying to repair it. I had heard of marine fibreglass being used successfully in one other case to repair a leaking ZCell, so I discharged the battery completely then set to work patching it up the following weekend. This involved:
Wearing PPE (gloves, a respirator, safety glasses)
Turning the DC isolator off so the unit wasn’t powered
Carefully wiping the electrolyte leak away with a rag
Taping up some plastic sheeting below the stack to keep the rest of the unit clean
Cleaning the stack carefully with a little methylated spirits
Roughing up the surface with some 80 grit sandpaper
Cleaning the stack carefully again with methylated spirits
Mixing up West System 105 epoxy resin and 205 hardener
Applying four layers of fibreglass cloth (which is irritating to cut with scissors, by the way) in alternating directions – each layer was wet down with epoxy using a brush then applied to the surface of the stack with a little metal roller
Crawling out from under the house, unkinking most of the muscles in my body, and having a long hot shower
Waiting til the day after next for everything to set in case 24 hours wasn’t enough before recommissioning the unit
Here’s a picture of the finished repair. I didn’t bother sanding it smooth because it just doesn’t matter – it’s a battery, not the hull of a boat – and I could do without creating any more dust. I actually ran two strips of fibreglass because there was also a little split at the top of the stack, although that one didn’t appear to be leaking.
Not too bad for a first attempt, and good practice for repairing the hull of my rowboat.
Recommissioning the unit was interesting. Upon bringing it online, it immediately went into a safe shutdown state because the electrolyte had dropped down from its usual running temperature of 18-24°C to a bit over 8°C during the five days it had been offline.
In order for a ZCell to charge, the electrolyte needs to be at least 10°C, and at least 15°C for it to discharge. So I took one of our 2400W panel heaters under the house and turned it on next to the unit. “How long does it take for a 2400W panel heater to heat 100L of nearby liquid electrolyte by 2°C?” I hear you ask. “Too damn long” is the answer. I started the heater at 13:10, and it was 16:50 before the battery was convinced that the electrolyte was far enough above 10°C to be happy to start charging again. But, charge it did, and we were up and running again… Until the next morning when my wife detected that nasty smell in the ensuite again. Upon further inspection of the unit I found a tiny drip coming from the bottom of the front of the stack, behind the Battery Control Module (BCM). So once again I discharged and decommissioned the battery in preparation for repair.
This time I had to pull the BCM off to get to the electrode stack behind it. First I checked that the battery really was discharged with a multimeter. Then I disconnected the DC bus cabling, the comms cable, and the connections to the pumps, fans and leak sensors after taking a photo to make sure I was going to put everything back in the right place. Finally I unscrewed the battery terminals and pulled the BCM off. It’s actually a really neat arrangement – the BCM just slides on and off over the two cylindrical terminals the come out of the stack. It’s a bit stiff, but once you know it comes off that way, you just jiggle and pull until it’s removed.
The BCM comes off…
One thing to be extremely careful of is that you don’t accidentally drop the washers from the battery terminals down into the bottom of the battery, because if you do you will never, ever get them back again, and will have to order some M8 silicon bronze belleville washers from Bronze & Brass Fasteners Pty. Ltd. to use as replacements. I don’t know exactly what Redflow used here, but I know they’re belleville washers from reading the manual, I determined they were M8 by measuring them, silicon bronze is apparently a good electrical conductor, and BABF would let me buy them individually rather than in lots of a hundred. I purchased eight, in order to have several more spares.
…to expose the front of the stack, with some cracks at the bottom.
There was a bit of cracking on the front of the stack, similar to what was on the back, so then it was a repeat of the earlier fibreglassing procedure. This added a couple of millimetres depth to the front of the stack. Now the circuit board in the BCM could no longer sit flat against the ends of the battery terminals, due to a series of little protrusions on the back of the BCM case which would ordinarily sit flush against the stack. These I lovingly twisted off with a pair of pliers. Then I reconnected all the cabling and used a nice shiny new torque wrench to ensure the battery terminals and DC bus cabling were tightened to 10Nm per the manual.
Here’s the freshly applied fibreglass…
…and the slightly modified BCM.
I wasn’t quite ready to recommission the battery yet at this point given we’d had those two nasty fume experiences in the bedroom. If I’d missed anything with this second repair, or if anything else broke and there were gas emissions of any kind, we didn’t want to experience them. So I purchased a sub floor ventilation kit with a bushfire compliant external vent and installed that under the house.
I assure you the walls and floor are actually straight and level.
Then in early October (it took a while to get all this work done around my actual job) I brought the battery online again. Of course I had to once more use the panel heater to get it going. We’re running the ventilation fan 24/7 (it’s very quiet), and I got into the habit of going under the house and doing a visual inspection of the battery every morning as part of our daily rounds feeding the chickens.
Everything went beautifully until October 29 when I noticed a small drip of red liquid which appeared to be coming from a capillary tube on the front right of the stack. The previous leaks had appeared clear, which I imagine means they were from the zinc electrolyte side of the battery, whereas this red suggested to me a leak from the bromine side of the battery. The electrolyte actually initially has the same chemical composition in both tanks, but charging changes it – you’ll see the electrolyte in the bromine pipe on the front of the battery go orange, then red, as charge increases, whereas the electrolyte visible in the zinc pipe remains a mostly transparent pale yellow. Anyway, another leak meant decommissioning the battery again to investigate.
This is new.
Off came the BCM again to check my fibreglass work (pristine and undamaged!) and off came the side of the enclosure to get a better look at the problem area. I inspected the capillary tube in minute detail with a magnifying glass and couldn’t find any obvious damage that would explain the leak. The tube runs from the top of the stack down through a hole in the bottom front of the stack, which keeps everything very neat, but honestly seemed to me like it was introducing a potential weak point into the front of the stack, so I rerouted the tube then carefully filled the hole in the stack with Araldite, on the assumption the leak was actually in the stack. In case the leak turned out to be in the tube, I ordered some viton tubing to use as a replacement. After recommissioning the battery again no further leak was evident over several days, indicating that the problem was indeed in the stack, and not in the tube. This is probably just as well, as my new viton tubing turned out to have a slightly smaller internal diameter (2mm) than whatever Redflow used during manufacture – maybe 2.5mm?
Capillary tube rerouted off to the side.
No more hole in front bottom of stack.
A few days after that on November 7, the battery indicated a hardware failure due to “impedance error”, and a small drip of red liquid appeared on the front left side of the stack. I’ve been told this error can be due to higher pH levels and the formation of zinc hydroxide, or degradation of the core electrode, or separator failure, or overheating of the stack reactor. Within the limits of my knowledge and ability there was very little I could do about any of these things, so then the question became: what next?
At this point I felt that I had really pushed things as far as I reasonably could. If the fibreglass repairs on the front and back of the stack proved sufficient and nothing else had gone wrong, I would have been happy to continue operating the battery as we had been since March, but these additional leaks and the impedance error suggested to me that things were going to continue to go downhill. It seems that the answer to the question posed in my last post, i.e. “how far down the road can we kick the migration can” turned out to be about eight months.
The only technology that’s immediately viable for us to switch to is LiFePO4 batteries. We’re looking at a stack or two of Pylontech Pelios because they will work with our existing Victron inverter/charger gear and come in IP65 cases so can be installed outdoors without too much difficulty. It will still be a while before we can get those installed though, and in the meantime I can’t just decommission the existing battery because then our solar generation won’t work.
The way our system is installed, we have solar panels connected to an MPPT on the DC bus, which is connected to the battery, and to our MutliPlus II Inverter/Chargers which in turn power our loads. The ZCell Battery Management System (BMS) tells the system what the battery charge and discharge limits are. If the battery is missing or broken, the BMS will not let the MPPT run at all, which means that no battery = no solar power, not even to power our loads.
I actually tried to work around this problem a couple of times in the past when previous batteries were dead. There’s a description of some unsuccessful attempts in an earlier post, and I also separately tried to fake up what I called a Virtual ZCell in software back in December 2024. In that experiment I was telling the Victron system that there was a battery present at 25% state of charge, but with a charge current limit of 0A (so it wouldn’t try to charge the non-existent battery) and a discharge current limit of 1A (so the battery looked at least a little bit available but no real discharge would be attempted). Incredibly this worked, but after about a week the MPPT started raising various errors so I gave it up. It seems that it’s necessary for a real physical battery to be present in order to help correctly regulate the voltage on the DC bus.
Back to the real battery: After the impedance error occurred I shut it down, rerouted the left front capillary tube, Araldited up its hole and the bottom front of the stack as I had done on the right hand side, then recommissioned the battery and reset the impedance error. I then set the system maximum state of charge limit in the BMS to the lowest value possible (20%). This would mean that the battery would never be charged much, and thus would never be stressed much. I imagined that whatever reactions were happening inside the stack that were causing things to break would happen either less often, or with less severity, or both.
The Araldite repairs were ultimately not completely successful. A teeny tiny red drip or two have since reappeared on the bottom front of the stack, but they are very small drips. I wipe them up every couple of days. The impedance error has not yet recurred. The fibreglass is still solid. We are thus able to continue to use our whole system in some capacity – notably with functioning solar power generation – until we’re able to migrate to those Pelios.
I will continue to write about our system in future, but this post will probably be the last that covers Redflow batteries in any detail. I have included some further observations below in the hope that they will be useful to others such as the Flow Battery Research Collective in their efforts to design and build a viable open source flow battery. I would also like to take the opportunity to express my thanks to Stuart Thomas (another ZCell user) for plenty of helpful advice and interesting discussions over the past year.
Appendix A – Redflow Hardware
There’s an article on the design of the ZBM3 on Simon Hackett’s blog from back in May 2021. Having now spent a fair bit of time physically messing with the ZBM3 myself I’m happy to confirm almost all the good things in that post about about the design of the unit – the whole thing is just much neater and nicer than the ZBM2. The one thing that ultimately didn’t work out with the new design, unfortunately, was improved reliability. Our original ZBM2 lasted from August 2021 to December 2023 – just over two years – before failing due to a leak in the electrode stack. Our first ZBM3 failed after nine months. The subsequent one started leaking after six months and even though I continue to nurse it along, I think we can reasonably treat that one as failed too.
Based on my experience, the reliability issues are all in the stack. Whether that’s a problem with the manufacture of the stack, or chemical reaction issues at runtime, or a combination of the two, or something else entirely, I don’t know. But if those problems could be fixed or mitigated somehow, the rest of the design is quite clever:
The two electrolyte tanks are side by side, with pumps at the front of each tank.
The pipes from the pumps run across to the opposite side of the front of the stack, so electrolyte from the left (bromine) tank flows up into the bottom right of the stack, then through the stack diagonally to come out the back left of the stack and down into the top of the bromine tank. The reverse is true for the right (zinc) tank, which comes up into the bottom left of the stack.
The pipe from the zinc tank has extra long sections of hose that loop around under the stack so that cool air can be blown over them from the fan in the rear of the unit if necessary.
The catch can – which captures potential gas emissions using activated carbon – sits neatly out of the way in a cavity between the two tanks.
The electrode stack is just strapped to the top of the tank, so can theoretically be lifted off for replacement by cutting the straps.
The BCM just slides onto the terminals that come out of the front of the stack as described elsewhere in this post.
The latest version of the ZBM3 has a screw cap on the front of the zinc tank for easy installation and replacement of the carbon sock.
Nevertheless, in my opinion, there is still room for improvement:
I’ve been told that if you do actually remove the electrode stack, whatever electrolyte remains in it will spill everywhere. Some sort of shutoff valve or set of plugs might be in order here.
I have not the faintest clue how one would go about replacing the pumps should they fail, without getting electrolyte all over the place. I imagine you would have to drain the electrolyte first, but how? With some sort of siphon?
There’s a filter somewhere in the tank which can apparently become clogged. I don’t know how to inspect, clean or replace this if necessary.
The position of the catch can, while neat, seems to unavoidably result in the hose connected to it being kinked. One of the line items in the annual maintenance checklist specifically says to make sure that hose isn’t kinked, so I pulled my catch can out of its hole and left it sitting horizontally across the front of the unit.
Speaking of the catch can, it’s connected to the zinc tank by a pressure release valve, although the ZBM3 manual erroneously states that it’s connected to the bromine tank. There does not appear to be any pressure release valve attached to the bromine tank, whereas the ZBM2 had a gas handling unit consisting of pressure release valves connected to both tanks (see section 4.7 “Gas Handling Units” in the ZBM2 manual). Is the pressure release really not necessary for the bromine tank with the ZBM3, or is this another source of potential trouble?
The carbon sock, which needs to be replaced annually, is interesting. It looks a bit like a door snake, but it’s filled with some sort of carbon material and sits in the zinc tank. I understand it behaves somewhat like a sacrificial anode, the idea being that whatever corrosion or oxidation that might happen to the carbon in the electrodes in the stack under certain operating conditions, will instead happen to the carbon in the sock.
Another item of annual maintenance is to “check the pH and adjust if necessary”. How, exactly, and to what value? I have recent correspondence which says the pH should ideally be 1.5-2.5 and that it can be lowered by adding hydrochloric acid and running the pumps for a couple of days, but it would be helpful to somehow include a pH sensor in the unit given that mopping up leaks with litmus paper isn’t really very accurate.
As for the stack itself, it looks like a solid rectangular chunk of some sort of fibreglassy material. The front plate appears to be a separate piece that was stuck on somehow, which I assume makes for a weaker spot between that plate and the rest of the stack. This could explain some of the leaks I experienced. Maybe the ZBM2 design with the bolts holding two stacks together really did make for a better seal?
Finally, the most recent enclosure design – a solid metal box with cowlings on each end that allow airflow but not animal ingress – is excellent until you have to do any work on it. Everything is very heavy, and once you remove the screws that hold one of the ends on, everything has a tendency to slip just slightly out of alignment, making it difficult to screw back together, at least for one person. Given the frequency with which I ended up needing to inspect and mess around with my most recent unit, I removed the ends and one side of the enclosure and just left it that way.
The top of the enclosure is a great place to store socket set, multimeter, rags, screws, tape, etc.
Appendix B – Redflow Software
The management and monitoring interface for the BMS is overall decent and easy to use. There’s a main status screen from which you can drill down to get more detail and perform various operations. The included quick start and reference guides are very thorough and cover most of the details, so rather that describing the UI further here I’ve decided to reproduce those manuals in PDF form for posterity:
You can easily get full details of the current state of all connected batteries by browsing the UI, or by hitting the /rest/1.0/status endpoint to dump everything in JSON format. It’s possible to browse the last three months of BMS logs via the UI, but they disappear after that. There are historical graphs for battery current, voltage, temperature and state of charge, but their resolution decays the further back in time you go. I assume it’s using something like RRDtool behind the scenes.
Historical logs of battery state is where I ran into trouble. You can browse these via the UI, or download CSV reports for a given timespan, but the problem is the battery state is only recorded at one minute intervals. This means that if anything interesting happens in the 59 seconds between two log entries, you don’t see it. There’s a longer description of this issue in the first post in this series, where I noticed that the Charge, Discharge and EED contators in the battery were toggling on and off far more often than expected. It’s also a problem if a warning or error is triggered for only a few seconds. In this case the BMS logs will show something like the following:
2025-12-02 04:56:15 WARN ZBM:1 has indicated 'warning_indicator' state
2025-12-02 04:56:17 INFO ZBM:1 is no longer in 'warning_indicator' state
The logs of battery state however will not show the warning at all. In the above example the warning occurred from 04:56:15 to 04:56:17, but the battery logs surrounding that event only show the state at 04:55:59 and 04:56:59 when nothing interesting was happening. You can work around that by writing a script to scrape the REST status endpoint at, say, one second intervals then log that data to a separate database, but it would be better if this were handled by the BMS somehow. I know logging battery state every second would create way too much data for a tiny device to store, but maybe just logging on state changes? Or at the very least if there’s a warning or error, the BMS should log which warning indicator is active and the associated value. In the above example I happen to know it was a low temperature warning, but that’s only because I was doing exactly what I mentioned above, i.e. running a script externally to check the status every second and displaying the state if something changed.
Appendix C – On Flow Batteries in General
Despite everything I’ve been through with our various ZCells, I remain convinced that flow batteries in general are a better idea for long-term stationary energy storage than lithium, provided they can be made to actually live up to the promise of a multi-decade lifespan. The most obvious pros compared to lithium in my opinion are:
Safely, due to using nonflammable electrolyte
Sustainability, due to use of abundant chemicals and minerals
Longevity, due to potentially not suffering from capacity reduction over time
The most obvious cons compared to lithium are:
Size (they are much bigger and heavier)
Efficiency (they may require more power in to get a given amount out, and have some continuous runtime power draw for their pumps)
In the context of vehicles, mobile phones, and other portable devices, those cons matter. But for homes, apartment blocks, microgrids, community batteries, hospitals, schools, commercial establishments, grid scale batteries, etc., those things just don’t (or shouldn’t) matter (as much). There’s physically more space in a house than in a car, and if you need slightly more generation capacity to offset potentially lower efficiency, then that just means having a couple more solar panels that you might otherwise.
Things get a bit more difficult though when you start thinking about consumer acceptance. A flow battery is, fundamentally, a machine. It has moving parts, and it potentially requires maintenance. Readers will have realised by now that I am quite madrather obsessed somewhat of an enthusiast and don’t mind having to tinker with things occasionally. I imagine this is not the case for most people, who likely want their energy systems to Just WorkTM and require no special attention.
A viable flow battery, especially for residential usage, thus needs to be as low-maintenance as possible, and as easy to work on as possible when it does require maintenance. That latter point is a function of both unit design, and choice of installation site. The crawl space under our house for example is ideal if a battery only requires infrequent maintenance and minimal disassembly. If the whole unit needed to be stripped it would be much better off in a shed or other room that has appropriate access. Finally, these things need to come with a complete service manual including descriptions of all the parts and every possible procedure that might need to be performed in case of failure.
I was talking to a friend the other day about our shared mutual appreciation of virtio-vsock, and it made me wonder something. How do virtual machines on Linux actually work? I know it involves qemu and the kernel’s KVM virtual machine implementation, but exactly how do they interact? How does the kernel get qemu to do emulation tasks as required?
qemu is several things hanging out together in a trench coat, but one of those things is software which can configure Linux’s built-in KVM virtual machine functionality to run a virtual machine, and then handle emulation of the devices that virtual machine is attached to which cannot be represented with actual physical hardware. This part of qemu is called a “KVM client” in the Linux kernel documentation. Its called that because if we ignore the emulation part for now it is just literally a client calling established APIs to the Linux kernel.
I was mildly surprised to find that this topic is reasonably well documented and actually not particularly hard to implement once you know what to look for. I should also admit that I have been playing with AI models recently, and Anthropic’s Sonnet 4.5 was actually quite helpful. I don’t know how other people use these models, but I’ve fallen into two patterns of usage. In this case I used Sonnet like a search engine, with prompts like:
Please help me come up with a good project name which is a pun on python and kvm.
It came up with “pypervisor”, which I think is actually a pretty good name even though it does not include KVM in the pun at all. Or this example:
What do I pass to libc.ioctl when there is no data to pass?
I would then use the results of those “searches” to write my code.
The other method of usage is more trusting, where I have very recently started using Claude Code to generate code for me. So far I’ve only done this for private projects I will never release publicly, and its definitely a great way to churn out lots of code quickly, although I have had problems with that code not being correct or hallucinating APIs to call. This mode of operation is also noticeably worse as a way of learning, as you’re just presented with a finished product without any of the iterative process to get there.
I have also rapidly found that my biggest concern with these models is that they don’t provide supporting references for their statements, which is something we expect middle school kids to figure out. I would be much more comfortable if it could provide links to sources it used, both so I could validate what it said but also so the underlying authors could get more credit and incentive to keep writing. Google’s “AI Mode” sort of seems to do this, so it doesn’t seem completely impossible.
Instead, I’ve been using prompts like this:
Can you recommend any web pages which demonstrate using python to create a KVM virtual machine?
Or, from another session the other day on an unrelated topic:
Please provide me with some links to web pages which further discuss this topic.
So having said all that, the following references were quite helpful:
This YouTube video entitled “How to write your own KVM client from scratch – Murilo Opsfelder Araújo” is a conference talk from linuxdev-br and kicked off this whole tangent, so deserves credit.
This 11 minute conference talk is both shorter as well as written by the author of the LWM article. Its quite interesting, but skips a lot of the lower level details I was specifically interested in. It also discusses why you might want to do something like this, apart from mere curiosity. The answer comes basically down to being able to run code in a tightly constrained environment.
This blog post also covers the material reasonably well.
In the end, it turns out the Kernel API for running a virtual machine with KVM really isn’t that bad. Especially once you’ve accepted that the API doesn’t look at all like what a userspace developer would expect the API to look like. Instead of making calls to a library like glibc, you need to open /dev/kvm, and then use a series of ioctl calls to configure your virtual machine. Objects you create along the way are tracked as file descriptors, which makes sense given ioctls want to happen on files.
I don’t have a strong history here on providing tutorial content, so I am not entirely sure how to present my example while also producing a single block of usable code. While the entire project is on github, I am including the main body of the code here for ease of reference. For that main body I have opted to just putting in a lot of comments. Feedback on this approach is welcome.
#!/usr/bin/python3
# A not very good KVM client / virtual machine manager written in python.
# Based heavily on the excellent https://lwn.net/Articles/658511/.
# Development was assisted by Claude Sonnet 4.5, and US Intellectual
# Property law.
import ctypes
import fcntl
import mmap
import sys
from displayhelpers import *
from exitcodes import *
from ioctls import *
from structs import *
# A single 4kb page
MEM_SIZE = 0x1000
def main():
# Open the KVM device file. This gives us a top level reference to the
# KVM API which we can then make global calls against.
with open('/dev/kvm', 'rb+', buffering=0) as kvm:
try:
# Check that the API is a supported version. This should
# basically never fail on a modern kernel.
api_version = fcntl.ioctl(kvm, KVM_GET_API_VERSION)
print(f'KVM API version: {api_version}')
if api_version != 12:
print(f'KVM API version {api_version} was unexpected')
sys.exit(1)
except OSError as e:
print(
f'Failed to lookup KVM API version: {e.errno} - {e.strerror}'
)
sys.exit(1)
# Create a VM file descriptor. This is the "object" which tracks the
# virtual machine we are creating.
print()
try:
vm = fcntl.ioctl(kvm, KVM_CREATE_VM)
print(f'VM file descriptor: {vm}')
except OSError as e:
print(f'Failed to create a VM: {e.errno} - {e.strerror}')
sys.exit(1)
# mmap memory for the VM to use. Sonnet 4.5 alleges that we need to
# use mmap here instead of just allocating a largeish byte array in
# native python for a few reasons: the allocation needs to be
# page-aligned; mmap'ed memory can be zero-copied into the virtual
# machine, a python array cannot; MAP_SHARED means our python process
# can inspect the state of the virtual machine's memory; python
# memory allocations are not at a stable location -- python might
# rearrange things. So yeah, those seem like reasons to me.
mem = mmap.mmap(
-1,
MEM_SIZE,
prot=mmap.PROT_READ | mmap.PROT_WRITE,
flags=mmap.MAP_SHARED | mmap.MAP_ANONYMOUS,
offset=0
)
mem_buf = (ctypes.c_char * len(mem)).from_buffer(mem)
mem_addr = ctypes.addressof(mem_buf)
print(f'VM memory page is at 0x{mem_addr:x}')
# This is the data structure we're going to pass to the kernel to
# tell if about all this memory we have allocated.
region_s = kvm_userspace_memory_region_t()
region_s.slot = 0
region_s.flags = 0
region_s.guest_phys_addr = 0
region_s.memory_size = MEM_SIZE
region_s.userspace_addr = mem_addr
try:
# This dance gives us the address of the data structure, which is
# what the kernel is expecting.
region_bytes = ctypes.string_at(
ctypes.addressof(region_s), ctypes.sizeof(region_s))
fcntl.ioctl(vm, KVM_SET_USER_MEMORY_REGION, region_bytes)
except OSError as e:
print(f'Failed to map memory into VM: {e.errno} - {e.strerror}')
sys.exit(1)
# Add a vCPU to the VM. The vCPU is another object we can do things
# to later.
try:
# The zero here is the index of the vCPU, this one being of
# course our first.
vcpu = fcntl.ioctl(vm, KVM_CREATE_VCPU, 0)
print(f'vCPU file descriptor: {vcpu}')
except OSError as e:
print(f'Failed to create a vCPU: {e.errno} - {e.strerror}')
sys.exit(1)
# mmap the CPU state structure from the kernel to userspace. We need
# to lookup the size of the structure, and the LWN article notes:
# "Note that the mmap size typically exceeds that of the kvm_run
# structure, as the kernel will also use that space to store other
# transient structures that kvm_run may point to".
try:
kvm_run_size = fcntl.ioctl(kvm, KVM_GET_VCPU_MMAP_SIZE)
except OSError as e:
print(
f'Failed to lookup kvm_run struct size: {e.errno} - '
f'{e.strerror}'
)
sys.exit(1)
print()
print(f'The KVM run structure is {kvm_run_size} bytes')
kvm_run = mmap.mmap(
vcpu,
kvm_run_size,
prot=mmap.PROT_READ | mmap.PROT_WRITE,
flags=mmap.MAP_SHARED,
offset=0
)
kvm_run_s = kvm_run_t.from_buffer(kvm_run)
kvm_run_addr = ctypes.addressof(kvm_run_s)
print(f'vCPU KVM run struture is at 0x{kvm_run_addr:x}')
print()
print(pretty_print_struct(kvm_run_s))
print()
# Read the initial state of the vCPU special registers
sregs = kvm_sregs_t()
fcntl.ioctl(vcpu, KVM_GET_SREGS, sregs)
print('Initial vCPU special registers state')
print()
print(pretty_print_sregs(sregs))
print()
# Setup sregs per the LWN article. cs by default points to the
# reset vector at 16 bytes below the top of memory. We want to start
# at the begining of memory instead.
sregs.cs.base = 0
sregs.cs.selector = 0
fcntl.ioctl(vcpu, KVM_SET_SREGS, sregs)
# Read back to validate the change
sregs = kvm_sregs_t()
fcntl.ioctl(vcpu, KVM_GET_SREGS, sregs)
print('CS updated vCPU special registers state')
print()
print(pretty_print_sregs(sregs))
print()
# Read the initial state of the vCPU standard registers
regs = kvm_regs_t()
fcntl.ioctl(vcpu, KVM_GET_REGS, regs)
print('Initial vCPU standard registers state')
print()
print(pretty_print_struct(regs))
print()
# Setup regs per the LWN article. We set the instruction pointer (IP)
# to 0x0 relative to the CS at 0, set RAX and RBX to 2 each as our
# initial inputs to our program, and set the flags to 0x2 as this is
# documented as the start state of the CPU. Note that the LWN article
# originally had the code at 0x1000, which is super confusing because
# that's outside the 4kb of memory we actually allocated.
regs.rip = 0x0
regs.rax = 2
regs.rbx = 2
regs.rflags = 0x2
fcntl.ioctl(vcpu, KVM_SET_REGS, regs)
# Read back to validate the change
regs = kvm_regs_t()
fcntl.ioctl(vcpu, KVM_GET_REGS, regs)
print('Updated vCPU standard registers state')
print()
print(pretty_print_struct(regs))
print()
# Set the memory to contain our simple demo program, which is from
# the LWN article again. Its important to note that the memory we
# mapped earlier is accessible to _both_ this userspace program and
# the vCPU, so we can totally poke around in it if we want.
program = bytes([
0xba, # mov $0x3f8, %dx
0xf8,
0x03,
0x00, # add %bl, %al
0xd8,
0x04, # add $'0', %al
ord('0'),
0xee, # out %al, (%dx)
0xb0, # mov $'\n', %al
ord('\n'),
0xee, # out %al, (%dx)
0xf4, # hlt
])
mem[0:len(program)] = program
# And we now enter into the VMM main loop, which is where we sit for
# the lifetime of the virtual machine. Each return from the ioctl is
# called a "VM Exit" and indicates that a protection violation in
# the vCPU has signalled a request for us to do something.
while True:
print('Running...')
fcntl.ioctl(vcpu, KVM_RUN)
kvm_run_s = kvm_run_t.from_buffer(kvm_run)
exit_reason = VM_EXIT_CODES.get(
kvm_run_s.exit_reason,
f'Unknown exit reason: {kvm_run_s.exit_reason}'
)
print(f'VM exit: {exit_reason}')
print()
match exit_reason:
case 'KVM_EXIT_HLT':
print('Program complete (halted)')
sys.exit(0)
# Claude had the direction values backwards. qemu definitely
# agrees with this though.
# https://github.com/qemu/qemu/blob/master/linux-headers/linux/kvm.h#L245
case 'KVM_EXIT_IO':
io = kvm_run_s.exit_reasons.io
print(pretty_print_struct(io))
print()
# We only handle input to ioport 0x03f8 for now
if io.direction == KVM_EXIT_IO_OUT and io.port == 0x3f8:
data_ptr = ctypes.addressof(kvm_run_s) + io.data_offset
data = ctypes.string_at(data_ptr, io.size)
try:
data_str = data.decode('ascii')
except:
data_str = 'failed to decode ASCII'
print(f'Output: {data}, {data_str}')
else:
print('Not yet implemented...')
sys.exit(1)
case 'KVM_EXIT_SHUTDOWN':
print('VM shutdown')
sys.exit(0)
case 'KVM_EXIT_INTERNAL_ERROR':
print('Internal errors are probably bad?')
sys.exit(1)
case _:
print(f'Unhandled VM exit: {exit_reason}')
sys.exit(1)
if __name__ == '__main__':
main()
Environment Modules allow users to dynamically modify the $PATHs in their shell environment. In particular, it provides the ability to switch between different software applications and versions. It is a very handy tool for developers who want to test against multiple versions or compilation options of an application and for users on a multi-user system (e.g., high performance computing) where both consistency and the opportunity to introduce newer features needs to exist for different research groups. The two major environment modules systems are the older, Tcl-based system, and the newer Lua-based system (Lmod). Both allow default versions of a software application to be set as in the $PATH when invoked. This can cause problems and should be disabled.
With a standard module system, one can see the modules for an application. For example, using a LMod system:
This illustrates that from the GNU Compiler suite (GCC), the application GCC/11.3.0 is loaded (by default in this case as part of the login) and that GCC/11.3.0 is the default as part of a wider toolchain. That is, if the command
module load GCC
is invoked, the existing environment $PATH will change so that it points to the binary, libraries, etc for GCC/12.3.0 instead of GCC/11.3.0. In the older, Tcl-based modules system, it was possible to have both versions loaded simultaneously; the most recent version would be the initial search path, but if a particular library, for example, couldn't be found, it would search whatever was in the $PATH. That could be an interesting issue for the replication of results.
Having a default obviously saves a few keystrokes, but problems can result. The major issue is that when new software is introduced by the local friendly HPC engineers, then the default changes. For example, the latest version of GCC, at the time of writing, is 15.1 (April 2025). If that was installed on a system, one would witness:
command, LMod will add GCC/15.1.0 to the $PATH. Is this a problem? Most certainly because when software (version, compiler) changes, the results can change. For example, with the release of GCC/13 it included the statement: -Ofast, -ffast-math and -funsafe-math-optimizations will no longer add startup code to alter the floating-point environment when producing a shared object with -shared. In a nutshell, this means that one can get different numerical results by compiling the same code between GCC/12 and GCC/13 due to different ways the compiler would handle rounding, precision, and exceptions when using floating-point calculations. This could be very interesting if one cares about that supposed hallmark of science, "reproducibility".
To avoid this, at a user-level, one should always use the fully-qualified version of a software application. For example, instead of loading a module with
module load GCC
, whether in an interactive session or job script, one should use the specific version, e.g.,
module load GCC/13.3.0
. Of course, sometimes users don't follow the friendly and sensible advice from their system engineers, and that's where a blunter tool needs to be invoked.
Site-wide LMod configuration (at let's face it, LMod is the overwhelmingly dominant version of environment modules these days) occurs in the lmod_config.lua file,
$LMOD_DIR/../init/lmod_config.lua
. Adding the following lines is sufficient:
-- Do not automatically pick a version when multiple options are available
always_load_default = false
-- Do not automatically swap modules from the same family
auto_swap = false
-- Disallow default-version resolution and force the user to specify which version
site_defaults = {
load_default = false,
}
Another option is to export optins in
/etc/profile.d/lmod.sh
or equivalent. This avoids making changes to the lmod_config file.
If a system is using EasyBuild for software installs (and many of us are), EasyBuild will create
.version
files which include the default; this can be removed by changing the
easybuild.cfg
file to have
module_install_defaults = False
.
One issue that arises from the above, from a support perspective, is that existing user expectations and job submission scripts may come into conflict with a new policy. For example, if a job submission script included
module load GCC
, the job would fail as LMod would be asking for user input on which specific version of GCC they wish to load.
This is a valid concern in the short term, but in the longer term, the benefits of providing consistent environment variables, reproducibility, and educating users far outweigh the costs. Further, the longer one persists with the policy of allowing application defaults the greater the damage.
LAME (recursive acronym Lame ain't an MP3 encoder) has been an exceptional application for creating MP3 audio files from other formats (e.g., WAV, FLAC, MPEG-1, MPEG-2) or even for re-encoding existing MP3 files. Creating MP3 files is popular due to their extensive use in portable media players, as they combine an impressive reduction in file size with minimal loss from uncompressed audio. LAME was first released 27 years ago, has been in regular development since, and is now bundled with the very popular audio editor, Audacity, and other excellent audio conversion applications like FFmpeg. The following are several examples of how to use LAME on the command line, convert files, and automate this process, along with the use of cdparanoia and ffmpeg. This might be handy if you want to (for example) back up a collection of CDs.
To extract files from existing media, use cdparanoia, a delightfully old and stable tool where the last release was September 2008 and the front-end for libparanoia, which is also used in the cdrtools suite, which can be used for the creation of audio and data CDs. To use cdparanoia to extract files from disk, first use the command to get the information about what drives you have, i.e., $ cdparanoia -vsQ, a verbose search and query. Then, to extract the files, simply use $ cdparanoia -B, batch mode, saving each track to a separate file. The default format is WAV; if one wants RAW, use $ cdparanoia -rB.
Using LAME to convert files, the simplest action is to convert a file with no options. e.g., $ lame input.wav output.mp3. This can be extended with the numerous options offered by LAME. Perhaps the most common is the -V% (variable bit rate) (e.g., $ lame -V0 input.wav output.mp3. Values are from 0 to 9, where 0 is the highest quality (but largest file size), which is good for very high-fidelity recording, whilst 9.999 is the lowest one, which is fine for low fidelity needs (e.g., a monophonic voice recording); the default is 4. Run a test on a WAV file to see if you can detect the difference in audio quality and check the file size differences. Note that despite being a good encoder, LAME is not recommended for archives; use a lossless audio codec (e.g., FLAC).
If one has multiple files to convert, a short loop is an effective strategy (e.g., $ for item in *.wav; do lame "$item" "${item%.wav}.mp3"; done). This is certainly preferable to opening files with a GUI application and running the conversion one file at a time. If one has a number of large files, then use of GNU Parallel is an excellent choice (e.g., $ parallel lame -V0 {} {.}.mp3 ::: *.wav). On a sample CD, the loop conversion took 57.201 seconds; with parallel, 13.264 seconds; it is fairly clear how this makes a difference if you have a large number of CDs. Note that with a smaller number and size of files, the use of GNU Parallel will actually take a longer time than a loop due to the overhead of splitting up the tasks. Certainly, if you are converting the WAV files from a CD, use GNU Parallel. Note that GNU Parallel will use all the cores available on a system; the number of concurrent jobs can be limited with the -j option (e.g., parallel -j2 lame -V0 {} {.}.mp3 ::: *.wav).
Finally, there is a great little tool called mp3wrap, which can wrap multiple MP3 files into a single MP3, but keeps filenames and metadata information, which can be later split back to original files (using the mp3splt command). The output file will be named OUTPUTFILE_MP3WRAP.mp3; keep the MP3WRAP string so that the split program detects that the file is wrapped with this utility. To create a single file, simply use $mp3wrap album.mp3 track*.mp3.
Have you ever bought a product that was a little bit expensive compared to its peers, largely on the basis of recommendations from friends? Were you then disappointed? Well, that’s me with the quadlock maghead magnetic charging head.
So first off, all I want is to be able to wirelessly charge my phone while commuting. I had a dock which did that, but it died, so I was in the market for something better. I drive a 4WD and actually go off road, and one of the flaws of the previous dock was that it wasn’t rigid enough in difficult terrain, and basically everyone said I should change over to quadlock. So… I did.
Now of course you need to buy into their eco system, so you can’t just buy a dock. I needed a windscreen mount, a case, and a magnetic charging head to replace the stock passive head, and all of that came to just over $150 AUD. My previous dock was $30 AUD so it felt fairly expensive.
I must say quadlock make excellent suction caps — the windscreen mount is super rigid and definitely a step up in terms of quality. However, the maghead has simply never worked right. I’ve tried with three USB current inputs — a 100 watt charger (because you want to charge fast right?), a 60 watt charger because that’s what all the other USB-C PD outlets in the car are, and whatever slightly terrible USB charging current the head unit can provide. All of these chargers cause the maghead to immediately overheat when I try to use it, regardless of environmental factors like ambient temperature of the starting temperature of the head. Now, the head unit charger takes maybe a few tens of seconds to overheat, but the others do it within single digit seconds so either way its pretty useless. I have repeatedly asked quadlock for documentation on the expected input current for the maghead, but they refuse to answer the question. The only documentation I can find is for the maximum output current for the head.
But its ok right? I was also told quadlock has excellent support. So I filed a ticket. Which they didn’t reply to for ages. All up, it took ten days to get them to answer my ticket and eventually receive a replacement unit. This included such gems as being asked for a photo of how hot the maghead is. Like… what? I want to reinforce that while their initial response to my ticket was fast, there were multiple sessions of waiting days for a trivial answer, and they still haven’t answered all my questions.
In the end they shipped me another maghead, which fails in exactly the same way. So I am going to give up and just put them in a draw and use the passive head with a USB-C charging cable plugged into my phone like a neanderthal. Oh, and I’ll warn everyone I meet that quadlock isn’t quite what people tell you it is, unless you just want a really good suction cap.
The 2025 New Zealand Research Software Engineering Conference was held on September 23-24th. It has run for almost a decade, starting in 2016 as the CRI Coding Conference before becoming the Science Coding Conference, a title it kept until 2020, when it became the Research Software Engineering Conference. Throughout this time, it has focused on science and engineering, high performance computing, and cloud computing and is aimed at various coders, sysadmins, software engineers, data analysts, and IT managers from public research institutions and has the endorsement of the RSE Association of Australia and New Zealand.
As the programme indicates, this year's selection of presentations and BoFs has a strong emphasis on machine learning and developments in artificial intelligence. This was evident right from the very start with Nick Jones' keynote address on the first day, and with numerous presentations throughout the conference. A concern must be raised when this pivot to AI/ML involves a recursive comparative testing being conducted with other AI/ML systems. Ultimately, the validity of computational modelling must come not only from the quality of the inputs but the real-world predictive (and hindcasting) value of the outputs.
Thankfully, there was little in the way of "AI/ML marketing hype" at this conference, which really was firmly dedicated to actual computational practice, development of skills and knowledge, and research outputs. Further, there was only a moderate amount of theory, and that's primarily for foundations, as it should be. Being in Aotearoa New Zealand, it is perhaps unsurprising that there were several presentations on Earth sciences, but also of note was the emphasis on climatology, oceanography, and new developments in forensics.
Interestingly, Australian research software development and education were present in a number of presentations, including speakers from WEHI and CSIRO. From Research Computing Services at the University of Melbourne received a surprising highlight with my own presentation, "Programming Principles in a High Performance Computing Environment", the first presentation of the conference and Daniel Tosello's "VSCode on the node" being the third. It bodes well for current and future Trans-Tasman collaboration.
Finally, the Research Software Engineering New Zealand Conference included a small number of explicit community-building presentations. In conferences such as these, presentations are often of a special interes,t but with a much wider and latent feature being the awareness of the directions that other institutions and individuals are taking and the strength of professional connections, a critical requirement not only for an individual's development, but also for raising the collective knowledge of the institutions that they belong to.
I’ve fallen into this pattern where I do an hour or so of self-directed learning in the mornings before going to work. Until recently it was an excellent CMU course on the design of SQL database systems, which I’ve mentioned previously here. I’ve finished that, so I thought I would do something shorter and fun as a break before finding another course to do. I chose The freeCodeCamp.org hot dog or not hot dog tensorflow course. 90 minutes seemed achievable, and I too wish to know if an object in front of me is a hot dog or not.
The course uses Google Colab to walk you through the process. Colab is interesting, I had never used it before and I must say the user interface wasn’t completely obvious to me. One issue is that the workspace keeps timing out — I do my learning in small bursts tetrised around the other things in my life, and Google wants to reap idle workspaces (which is fair). Combined with the quite slow dataset download process this just wasn’t working for me.
But I own computers, some of which have GPUs. I should just be able to run the same code in my homelab right?
Well… That’s a nice theory. It was much harder than I expected. Because my chosen machine has an integrated AMD GPU, I need to use AMD’s ROCm. That means I needed to enter a twisting maze of conflicting python, tensorflow and tensorflow-rocm versions, a typoed GPU version check (yes really), and spend a few hours fighting the incantations and reading reddit posts. Additionally, because I need real physical GPU hardware for this, I can’t just play in a virtual machine. I instead need to risk messing up a physical machine I am somewhat fond of, so that’s nice.
I’ll save you the time. To get tensorflow working on Debian 12 with an AMD GPU, I needed to do the following dance steps…
The version of ROCm I need only supports python 3.9, 3.10, and 3.12. Of course Debian 12 ships with python 3.11 which is the missing option there, so that’s nice. Let’s use pyenv to get a python 3.12 on Debian 12:
And now we can run python, import tensorflow, and ask if it has a physical GPU:
$ python3
Python 3.12.11 (main, Sep 21 2025, 08:10:09) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-09-21 08:24:18.780938: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print(tf.test.is_gpu_available())
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2025-09-21 08:24:25.156483: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.361397: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.361442: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.362540: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.362576: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.362608: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:26.362630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /device:GPU:0 with 1982 MB memory: -> device: 0, name: AMD Radeon Graphics, pci bus id: 0000:c6:00.0
True
>>> print(tf.config.list_physical_devices('GPU'))
2025-09-21 08:24:51.553486: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:51.553590: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-09-21 08:24:51.553613: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>>
Truly we live in the future.
I have updated this to post use ROCm 7.0.1, it originally used ROCm 6.4.
I’ve just been in Sydney for a couple of days for CloudCon 2025. I think depending on how you count this is my third one of these events — the event has changed names at least twice, so its actually a little hard to work out the lineage of the event. This year’s conference was noticeably smaller than last years which is confusing to me for an event which is so competitively priced and branded so heavily with the hot topics dejour. That said the event was well run, in a good venue, and well worth the time. That is, this event really deserves more support than its getting.
There were some clear themes from the event for me:
ClickHouse is cool. Or at least I think so. ClickHouse observability certainly has potential, but I think the underlying SQL database is actually the most interesting bit. ClickHouse is also investing heavily in the Australian market right now, so I suspect they’re seeing strong traction here.
No one talks about “devops” any more, because its become a meaningless term where everything is devops if you squint at it right. Instead people are using the term “platform engineering”, which doesn’t appear to have a particularly clear definition either.
OpenTelemetry is definitely a thing. I think literally every talk on the first day mentioned it.
There are a lot of people who want to do AI, but there does not seem to be a lot of success out there right now. The speakers now openly admit that at least 40% of AI projects will fail. Would Ed Zitron please step up to the blog?
If anyone mentions “MCP” one more time… Also, what is MCP? Its weird that no one seems to want to define it when there’s a very clear definition out there if you google it.
Everyone seems to now agree that Kubernetes is too complicated and that the complexity drives cost management problems. There is no talk of actually simplifying the ecosystem however.
So in summary, well done to the event organizers. I hope more punters get on board in the future.
Some time ago, I wanted to encrypt a bunch of data in an application I was writing in Rust, mostly to be stored in a database, but also session cookies and sensitive configuration variables.
Since Rust is widely known as a secure-yet-high-performance programming language, I was expecting that there would be a widely-used crate that gave me a secure, high-level interface to strong, safe cryptography.
Imagine my surprise when I discovered that just… didn’t seem to exist.
Don’t get me wrong: Rust is replete with fast, secure, battle-tested cryptographic primitives.
The RustCrypto group provides all manner of robust, widely-used crates for all manner of cryptography-related purposes.
They’re the essential building blocks for practical cryptosystems, but using them directly in an application is somewhat akin to building a car from individual atoms of iron and carbon.
So I wrote my own high-level data encryption library, called it StrongBox, and have been happily encrypting and decrypting data ever since.
Cryptography So Simple Even I Can’t Get It Wrong
The core of StrongBox is the StrongBox trait, which has only two methods: encrypt and decrypt, each of which takes just two arguments.
The first argument is the plaintext (for encrypt) or the ciphertext (for decrypt) to work on.
The second argument is the encryption context, for use as Authenticated Additional Data, an important part of many uses of encryption.
There’s essentially no configuration or parameters to get wrong.
You can’t choose the encryption algorithm, or block cipher mode, and you don’t have to worry about generating a secure nonce.
You create a StrongBox with a key, and then you call encrypt and decrypt.
That’s it.
Practical Cryptographic Affordances
Ok, ok… that’s not quite it.
Because StrongBox is even easier to use than what I’ve described, thanks to the companion crate, StructBox.
When I started using StrongBox “in the wild”, it quickly became clear that what I almost always wanted to encrypt in my application wasn’t some ethereal “plaintext”.
I wanted to encrypt things, specifically structs (and enums).
So, through the magic of Rust derive macros, I built StructBox, which provides encrypt and decrypt operations on any Serde-able type.
Given that using Serde encoders can be a bit fiddly to use, it’s virtually easier to get an encrypted, serialized struct than it is to get a plaintext serialized struct.
Key Problems in Cryptography
The thing about cryptography is that it largely turns all data security problems into key management problems.
All the fancy cryptographic wonkery is for naught if you don’t manage the encryption keys well.
So, most of the fancy business in StrongBox isn’t the encryption and decryption, but instead solving problems around key management.
Different Keys for Different Purposes
Using the same key for all of your cryptographic needs is generally considered a really bad idea.
It opens up all manner of risks, that are easily avoided if you use different keys for different things.
However, having to maintain a big pile of different keys is a nightmare, so nobody’s going to do that.
Enter: key derivation.
Create one safe, secure “root” key, and then use a key derivation function to spawn as many other keys as you need.
Different keys for each database column, another one to encrypt cookies, and so on.
StrongBox supports this through the StemStrongBox type.
You’ll typically start off by creating a StemStrongBox with the “root” key, and then derive whatever other StrongBoxes you need, for encrypting and decrypting different kinds of data.
You Spin Me Round…
Sometimes, keys need to be rotated.
Whether that’s because you actually know (or even have reason to suspect) someone has gotten the key, or just because you’re being appropriately paranoid, sometimes key rotation has to happen.
As someone who has had to rotate keys in situations where such an eventuality was not planned for, I can say with some degree of authority: it absolutely sucks to have to do an emergency key rotation in a system that isn’t built to make that easy.
That’s why StrongBox natively supports key rotation.
Every StrongBox takes one encryption key, and an arbitrary number of decryption keys, and will automatically use the correct key to decrypt ciphertexts.
Will You Still Decrypt Me, Tomorrow?
In addition to “manual” key rotation, StrongBox also supports time-based key rotation with the RotatingStrongBox type.
This comes in handy when you’re encrypting a lot of “ephemeral” data, like cookies (or server-side session data).
It provides a way to automatically “expire” old data, and prevents attacks that become practical when large amounts of data are encrypted using a single key.
Invasion of the Invisible Salamanders!
I mostly mention this just because I love the name, but there is a kind of attack possible in common AEAD modes called the invisible salamanders attack.
StrongBox implements mitigations against this, by committing to the key being used so that an attacker can’t forge a ciphertext that decrypts validly to different plaintexts when using different keys.
This is why I love cryptography: everything sounds like absolute goddamn magic.
Call Me Crazy, Support Me Maybe?
If you’re coding in Rust (which you probably should be), encrypting your stored data (which you definitely should be), and StrongBox makes your life easier (which it really will), you can show your appreciation for my work by contributing to my open source code-fund.
Simply by shouting me a refreshing beverage, you’ll be helping me out, and helping to grow the software commons.
Alternately, if you’re looking for someone to Speak Rust to Computers on a professional basis, I’m available for contracts or full-time remote positions.
When I recently announced that I was starting an open source crowd-funding experiment, I wasn’t sure what would happen.
Perhaps there’d be radio silence, or a huge out-pouring of interest from people who wanted to see more open source code in the world.
What’s happened so far has been… interesting.
I chose to focus on action-validator because it’s got a number of open feature requests, and it solves a common problem that people have.
The thing is, I’ve developed and released a lot of open source over the multiple decades I’ve been noodling around with computers.
Much of that has been of use to many people, the overwhelming majority of whom I will never, ever meet, hear from, or even know that I’ve helped them out.
One person, however, I do know about – a generous soul named Andy, who (as far as I know) doesn’t use action-validator, but who does use another tool I wrote some years ago: lvmsync.
It’s somewhat niche, essentially “rsync for LVM-backed block devices”, so I’m slightly surprised that it’s my most-starred repository, at nearly 400(!) stars.
Andy is one of the people who finds it useful, and he was kind enough to reach out and offer a contribution in thanks for lvmsync existing.
In the spirit of my open source code-fund, I applied Andy’s contribution to the “general” pool, and as a result have just released action-validator v0.8.0, which supports a new --rootdir command-line option, fixing action-validator issue #54.
Everyone who uses --rootdir in their action-validator runs has Andy to thank, and I thank him too.
This is, of course, still early days in my experiment.
You can be like Andy, and make the open source world a better place, by contributing to my code-fund, and you can get your name up in lights, too.
Whether you’re an action-validator user, have gotten utility from any of the other things I’ve written, or just want to see more open source code in the world, your contribution is greatly appreciated.
As I’m currently somewhat underemployed, and could do with some extra income, I’m starting an open source crowd-funding experiment.
My hypothesis is that the open source community, and perhaps a community-minded company or two, really wants more open source code in the world, and is willing to put a few dollars my way to make that happen.
So, in short, if you want more open source code to exist, now would be a good time to visit my Ko-fi page and chip in a few dollars.
If you’re curious to know more, my code fund page has a list of Foreseeably Anticipated Questions that might address your curiosity.
Otherwise, ask your questions in the comments or email me.
That was bizarre. I've had no spam comments on this blog for years. But something changed on 28 June. And I've had ~460 spam comments since then. To make things worse, I hadn't stressed to much about them because I thought all my posts had auto-moderation enabled. Turns out I was wrong. That is now fixed, and I've deleted all the spam comments.
Apologies to people that received notifications from this site for these bogus comments.
On December 14, 2024 – three weeks after I published the last exciting installment in this series of posts – our new Redflow ZCell battery, which replaced the original one which had developed a leak in the electrode stack, itself failed due to a leak in the electrode stack. With Redflow in liquidation there was obviously no way I was getting a warranty replacement this time around. Happily, Aidan Moore from QuantumNRG put me in touch with Jason Litchfield from GrazAg, who had obtained a number of Redflow’s post-liquidation stock of batteries. With the Christmas holidays coming up, the timing wasn’t great, but we were ultimately able to get the failed unit replaced with a new ZBM3.
At this point the obvious question from anyone who’s been following the Redflow saga is probably going to be: why persevere, especially in light of this article from the ABC which speaks of ongoing reliability issues and disturbingly high failure rates for these batteries. That’s a good question, and like many good questions it has a long and complicated answer.
The technical path of least resistance would have been to migrate to a small rack of Pylontech batteries, as these apparently Just WorkTM with our existing Victron inverter/charger gear. The downside is they’re lithium, so a non-zero fire risk, and our installation is currently in the crawl space under the dining room. If we switched to lithium batteries, we’d need to arrange a separate outdoor steel enclosure of some kind with appropriate venting and fans, probably on the other side of the driveway, and get wiring to and from that. My extremely hand-wavey guess at the time was that it’d easily have cost us at least $20K to do that properly, with maybe half of that being the batteries.
The thing is, I remain convinced that flow batteries are in general a better idea for long-term stationary energy storage than lithium. This article from the Guardian provides a quick high-level summary of what makes flow batteries different. What I really want to be able to do – given Redflow is gone – is migrate to another flow battery, ideally one that actually lives up to the promise of multi-decade longevity. Maybe someone will finally come up with a residential scale vanadium flow battery. Maybe someone will buy Redflow’s IP, carry on their work and fix some of their reliability issues (the latest update from the liquidators at the time of writing says that they have “entered an exclusive negotiation period with a party for the acquisition of Redflow Group’s intellectual property (IP) and certain specific assets”). Maybe we’ll even see a viable open source flow battery – I would love for this to happen, not least because if it failed I’d probably be able to figure out how to fix the damn thing myself!
Leaving our current system in place, and swapping in a new ZBM3 meant we could kick the migration can down the road a ways. It bought us more time to see what other technologies develop, and it cost a lot less in the short term than migrating to lithium would have: $2,750 including GST for a post-demise-of-Redflow 10kWh ZBM3 (although shipping was interesting – more on that later). The real trick going forwards is seeing exactly how far down the road we’ll be able to kick that can. How can we ensure the greatest possible longevity of the new battery?
Why do ZCells Fail?
Bear with me here…
The ABC article puts it down to manufacturing problems, notably a dependence on repurposed third-party components. While I can see that dependence causing all sorts of extremely irritating manufacturing and design issues, I’m not entirely convinced this is the whole story. I will freely admit that my personal sample is very small, but my two batteries both failing due to electrode stack leaks? If a hose split or a pump had died, or some random doohikey let the magic smoke out, then OK, cool, I get it, those I can see being repurposed third-party components. But these failures were apparently in the electrode stack, and I’m struggling to see how that could be a repurposed third-party component. If nothing else the stack (and the tanks) are surely the pieces that Redflow manufactured themselves. This is their core technology. What could be causing stack leaks? Are they just poorly manufactured, or is there some sort of chemical failure at runtime which physically splits the stack? Or something else? Bear in mind that this is all speculation on my part – I’m neither a chemist nor a battery manufacturer – but I know what I’ve seen, and I know what I’ve heard about leaks in other peoples’ batteries.
On the chemistry front, I found a paper from 2023 entitled Scientific issues of zinc-bromine flow batteries and mitigation strategies. This was authored by a bunch of researchers from the University of Queensland and the former CTO of Redflow, and highlights hydrogen evolution, zinc corrosion and zinc dendrite formation as the fundamental issues with zinc bromine flow batteries. I sincerely hope the authors will forgive me for condensing their fascinating ~9,000 word paper into the following 95 word paragraph:
When the battery is being charged, zinc is plated onto the electrodes. During discharge, the zinc is removed. Dendrites (little tree like structures) can grow due to uneven zinc deposition, or due to hydrogen gas evolution. Left unchecked, dendrites can puncture the separator between the electrodes and lead to short circuits. Additionally, hydrogen gas generated by the battery can raise the electrolyte pH. If the pH is too high, solid zinc can clog a membrane in the stack. If the pH is too low, it can cause zinc corrosion which can make the battery self-discharge.
What if Redflow just never completely solved or mitigated the above issues? Could a dendrite puncture not just the separator, but actually split the stack and result in it leaking? Could clogged membranes combined with hydrogen gas create enough pressure to do the same?
We know that ZBMs have a maintenance cycle which runs at least every 72 hours to first discharge the battery then (theoretically) completely strip the zinc from the electrodes over a subsequent two hour period. We also know that ZBMs have a carbon sock which sits inside the zinc electrolyte tank and helps to keep electrolyte pH in the correct operating range. This needs to be replaced annually.
What if 72 hours is still too long between maintenance cycles? If you search back far enough you’ll find that the maximum maintenance period was originally 96 hours, and I assume that was later revised down to 72 hours after experience in the field. I’ve had subsequent correspondence which says that even more frequent maintenance (24-48 hours) can be better for the batteries. I’ve also encountered a curious intermittent fault with the ZBM3 where occasionally the Strip Pump Run Timer in the battery operates at half speed. If that happens and you don’t notice and reset the battery, the maintenance cycle will actually occur every 144 hours, which is way too long.
In the past I’ve observed frequent high charge current warnings in the Battery Management System (BMS) logs. This is actually normal, as by default the charge voltage is configured to be 57.5V, and there’s a separate high current voltage reduction setting of 1V. The idea is that this will try to make the battery charge as quickly as possible, and if the current gets too high, it will drop the charge voltage dynamically by 1V, which results in current reduction. Is it possible this variable (i.e. potentially uneven) charge current results in uneven zinc deposition?
I’ve also noticed that the battery State of Charge (SoC) calculations get sketchier the longer it’s been between maintenance cycles. If I have maintenance set to 72 hours, then at the end of the maintenance cycle, the battery fairly reliably still reports about 7% SoC. With a 48 hour maintenance period, it reports about 3% SoC at end of maintenance, and with a 24 hour maintenance period, it’s more like 1%. Once maintenance completes the SoC is reset to 0% automatically (because the battery really is empty at that point), but this got me thinking… If the SoC calculation is off, is there any way the battery could inadvertently allow itself to overcharge? Given the numbers above are all obviously overestimates I hope it’s more likely that the battery undercharges, but still, I had to wonder.
Tweaks to Optimise Battery Lifespan
Aidan suggested three configuration tweaks which Redflow had told him to try to potentially help optimise battery lifespan:
Set the maximum SoC to 90% (rather than 100%)
Set the charge voltage lower, to ~55.5V
Perform maintenance as often as possible
These are all done via the BMS. The maximum SoC and maintenance time limit are set on the Battery Maintenance screen under Capacity Limiting and Maintenance Timing respectively. I went with 90% SoC as above and 48 hour maintenance. The charge voltage is on the EMS Integration screen. I’ve used the following settings:
Normal Charge Voltage: 55.5V
Charge-Blocked Voltage: 52.5V
Discharge/Maintenance Cycle Voltage: 55V
In my case, the Normal Charge Voltage was originally 57V, and as I dropped it by 1.5V to get to 55.5V, I dropped the Charge-Blocked and Discharge/Maintenance Cycle voltages by the same amount to arrive at the above figures.
Dropping the maximum SoC means that the battery can’t get completely full and stay there for a long time. This must reduce the total amount of zinc plated on the electrodes, which I hope helps reduce dendrite formation. I also found when reading the paper mentioned earlier that “H2 evolution occurs mostly near the top of charge with mossy or spongy like zinc being plated”, which looks like another good reason to avoid fully charging the battery.
Dropping the charge voltage necessarily reduces the charge current and I assume keeps it much more even than it would be otherwise. I have not seen any high charge current warning since making this change. On the other hand, it does mean the battery charges slower than it would otherwise. I did a little experiment to test this, just watching the figures for amperage and kW the BMS gave me when I tweaked charge voltages:
55.5V charges at 30A or about 1.7kW
56.0V charges at 35A or about 2.0kW
56.5V charges at 40A or about 2.3kW
This means I’m not using the battery as effectively as I could be with a higher charge voltage/current, but if this serves to extend the battery life, I think it’s worth it under the circumstances.
It’s important to keep an eye on is the Strip Pump Run Timer, which went weird on me a couple of times. I really should write a little script to automatically warn me if it starts running at half speed, but I’ve been habitually looking at the BMS briefly almost every day since the system was installed, so I noticed when this problem occurred because the maintenance timing was off. To reset a battery that gets in this state, go to Tools: ZBM Modbus Tool and write the value 0x80 to register 0x2053. This will appear to fail because it immediately resets the unit which thus never reports a successful write, but it does the trick.
Some time in the next six months I’m going to need to beg, borrow, steal or figure out how to manufacture carbon socks. The good news is that this time the replacement procedure is going to be really easy, because unlike the ZBM2 (where you had to mess with some pipe work) and my previous ZBM3 (where there was a cap on the side which in my case would have been completely inaccessible due to proximity to a wall), this one has an easy access screw cap on the front of the electrolyte tank.
Front of new ZBM3 prior to commissioning. The white thing in the bag is the carbon sock. The large black screw cap in the centre right is where the carbon sock goes.
Accessing the BMS without the Redflow Cloud
The Redflow cloud went offline in late October 2024. This allowed remote access to the BMS, and I understand that some Redflow customers were unaware that it’s possible to access the BMS locally without the cloud. The Redflow cloud allowed firmware updates, and also let Redflow staff monitor batteries and configure them remotely, but it is not actually a hard requirement that this system exist in order for the batteries to continue to operate.
One way to access the BMS locally is via the wifi network on the BMS itself. If this is turned on, and you search for wifi access points you should find one named something like “zcell-bms-XXXX”. The password should be “zcellzcell”. Once you’re connected, open a web browser and go to http://zcell:3000. If that doesn’t work, try http://172.16.29.241:3000. This should let you see the BMS status. If you try to make any configuration changes it will ask you to log in. The default username and password are “admin” and “admin”. These can be changed under Configuration: Users.
The other way of accessing the BMS is to connect to whatever the IP address of the BMS is on your local network. The trick in this case is figuring out what the IP address is. I know what mine is because I logged into my router and looked at its list of attached devices.
Given the Redflow cloud is down and Redflow is out of business, I would actually suggest going into the BMS Site Configuration screen and unchecking the “Enable BMS cloud connection” and “Allow Redflow access to system for service intervention” boxes. There are two reasons for this:
There’s no point having the cloud connection enabled if the cloud is gone. If this is left on, a process on the BMS will continually try to connect back to cloud.zcell.com which is never going to work, and you’ll perpetually have this irritating little “Cloud connection has been lost” indicator.
Possible subsequent risk of hacking. I assume the following is pretty unlikely, but it’s not impossible. Imagine once the zcell.com domain name expires if some malicious actor were to re-register that domain name, then set up their own cloud.zcell.com service. BMSes with the cloud connection enabled would blindly try to connect to it, which could allow a sufficiently sophisticated attacker to hack back into the BMS and mess with it, or potentially use the BMS as a jumping off point further into a site’s local network.
Personally I hope whoever buys the Redflow IP will turn the cloud back on, in which case the above advice will no longer apply.
Shipping ZBMs
Individuals such as myself can’t just ring up a random courier and say “Hey, can you please go to New South Wales, pick up a 278kg crate with hazchem stickers that say ‘corrosion’ and have pictures of dead fish, and bring it to me here in Tasmania?” The courier will say “Hell no”, unless you have an account with them. Accordingly I would like to thank Stuart Thomas from Alive Technologies through whom I was able to arrange shipping, because his company does have an account with a courier, and he was also after some batteries so we were able to do a combined shipment. If anyone else is looking to move these batteries around, the courier in this instance was Imagine Cargo. I understand Redflow in the past used Mainfreight and Chemcouriers. In all cases, the courier will need to know the exact dimensions and weight which are in the manual, and will want a safety data sheet. Here they are:
Further thanks to Stuart and Gus (whose flatbed truck almost didn’t make it up our driveway) for last mile delivery, swapping the new ZBM3 into the old enclosure, and getting the damn thing in under our house.
These things are very cumbersome
Final Thoughts
It’s disappointing on many levels that Redflow went under, but like I said earlier, I remain convinced that flow batteries are in general a better idea for long-term stationary energy storage than lithium. I find it interesting that the sale of Redflow’s IP includes “specific assets and shares in Redflow (Thailand) Limited”. Given that’s where the manufacturing was done, could that indicate that the buyer is interested in potentially carrying on further development or manufacturing work? The identity of the buyer remains confidential right now, and final settlement is still a year away, so I guess we’ll just have to wait and see.
Our new ZBM3 was commissioned on March 18, and has been running well ever since. I’ve done everything I know to do to try to ensure it has a long and happy life, and will continue to keep a very close eye on it. There will be followup posts if and when anything else interesting happens.
Some time rather earlier in this journey, I found an easter egg in the BMS, which I didn’t mention in any of my previous posts. I think that might be a nice note to finish on here.
The cheat code is UP UP DOWN DOWN LEFT RIGHT LEFT RIGHT B A
tl;dr: there is an attack in the wild which is triggering dangerous-but-seemingly-intended behaviour in the Oj JSON parser when used in the default and recommended manner, which can lead to everyone’s favourite kind of security problem: object deserialization bugs!
If you have the oj gem anywhere in your Gemfile.lock, the quickest mitigation is to make sure you have Oj.default_options = { mode: :strict } somewhere, and that no library is overwriting that setting to something else.
Prologue
As a sensible sysadmin, all the sites I run send me a notification if any unhandled exception gets raised.
Mostly, what I get sent is error-handling corner cases I missed, but now and then… things get more interesting.
In this case, it was a PG::UndefinedColumn exception, which looked something like this:
PG::UndefinedColumn: ERROR: column "xyzzydeadbeef" does not exist
This is weird on two fronts: firstly, this application has been running for a while, and if there was a schema problem, I’d expect it to have made itself apparent long before now.
And secondly, while I don’t profess to perfection in my programming, I’m usually better at naming my database columns than that.
Something is definitely hinky here, so let’s jump into the mystery mobile!
The column name is coming from outside the building!
The exception notifications I get sent include a whole lot of information about the request that caused the exception, including the request body.
In this case, the request body was JSON, and looked like this:
{"name":":xyzzydeadbeef", ...}
The leading colon looks an awful lot like the syntax for a Ruby symbol, but it’s in a JSON string.
Surely there’s no way a JSON parser would be turning that into a symbol, right?
Right?!?
Immediately, I thought that that possibly was what was happening, because I use Sequel for my SQL database access needs, and Sequel treats symbols as database column names.
It seemed like too much of a coincidence that a vaguely symbol-shaped string was being sent in, and the exact same name was showing up as a column name.
But how the flying fudgepickles was a JSON string being turned into a Ruby symbol, anyway?
Enter… Oj.
Oj? I barely know… aj
A long, long time ago, the “standard” Ruby JSON library had a reputation for being slow.
Thus did many competitors flourish, claiming more features and better performance.
Strong amongst the contenders was oj (for “Optimized JSON”), touted as “The fastest JSON parser and object serializer”.
Given the history, it’s not surprising that people who wanted the best possible performance turned to Oj, leading to it being found in a great many projects, often as a sub-dependency of a dependency of a dependency (which is how it ended up in my project).
You might have noticed in Oj’s description that, in addition to claiming “fastest”, it also describes itself as an “object serializer”.
Anyone who has kept an eye on the security bug landscape will recall that “object deserialization” is a rich vein of vulnerabilities to mine.
Libraries that do object deserialization, especially ones with a history that goes back to before the vulnerability class was well-understood, are likely to be trouble magnets.
And thus, it turns out to be with Oj.
By default, Oj will happily turn any string that starts with a colon into a symbol:
How that gets exploited is only limited by the creativity of an attacker.
Which I’ll talk about more shortly – but first, a word from my rant cortex.
Insecure By Default is a Cancer
While the object of my ire today is Oj and its fast-and-loose approach to deserialization, it is just one example of a pervasive problem in software: insecurity by default.
Whether it’s a database listening on 0.0.0.0 with no password as soon as its installed, or a library whose default behaviour is to permit arbitrary code execution, it all contributes to a software ecosystem that is an appalling security nightmare.
When a user (in this case, a developer who wants to parse JSON) comes across a new piece of software, they have – by definition – no idea what they’re doing with that software.
They’re going to use the defaults, and follow the most easily-available documentation, to achieve their goal.
It is unrealistic to assume that a new user of a piece of software is going to do things “the right way”, unless that right way is the only way, or at least the by-far-the-easiest way.
Conversely, the developer(s) of the software is/are the domain experts.
They have knowledge of the problem domain, through their exploration while building the software, and unrivalled expertise in the codebase.
Given this disparity in knowledge, it is tantamount to malpractice for the experts – the developer(s) – to off-load the responsibility for the safe and secure use of the software to the party that has the least knowledge of how to do that (the new user).
To apply this general principle to the specific case, take the “Using” section of the Oj README.
The example code there calls Oj.load, with no indication that this code will, in fact, parse specially-crafted JSON documents into Ruby objects.
The brand-user user of the library, no doubt being under pressure to Get Things Done, is almost certainly going to look at this “Using” example, get the apparent result they were after (a parsed JSON document), and call it a day.
It is unlikely that a brand-new user will, for instance, scroll down to the “Further Reading” section, find the second last (of ten) listed documents, “Security.md”, and carefully peruse it.
If they do, they’ll find an oblique suggestion that parsing untrusted input is “never a good idea”.
While that’s true, it’s also rather unhelpful, because I’d wager that by far the majority of JSON parsed in the world is “untrusted”, in one way or another, given the predominance of JSON as a format for serializing data passing over the Internet.
This guidance is roughly akin to putting a label on a car’s airbags that “driving at speed can be hazardous to your health”: true, but unhelpful under the circumstances.
The solution is for default behaviours to be secure, and any deviation from that default that has the potential to degrade security must, at the very least, be clearly labelled as such.
For example, the Oj.load function should be named Oj.unsafe_load, and the Oj.load function should behave as the Oj.safe_load function does presently.
By naming the unsafe function as explicitly unsafe, developers (and reviewers) have at least a fighting chance of recognising they’re doing something risky.
We put warning labels on just about everything in the real world; the same should be true of dangerous function calls.
OK, rant over.
Back to the story.
But how is this exploitable?
So far, I’ve hopefully made it clear that Oj does some Weird Stuff with parsing certain JSON strings.
It caused an unhandled exception in a web application I run, which isn’t cool, but apart from bombing me with exception notifications, what’s the harm?
For starters, let’s look at our original example: when presented with a symbol, Sequel will interpret that as a column name, rather than a string value.
Thus, if our “save an update to the user” code looked like this:
# request_body has the JSON representation of the form being submitted
body = Oj.load(request_body)
DB[:users].where(id: user_id).update(name: body["name"])
In normal operation, this will issue an SQL query along the lines of UPDATE users SET name='Jaime' WHERE id=42.
If the name given is “Jaime O’Dowd”, all is still good, because Sequel quotes string values, etc etc.
All’s well so far.
But, imagine there is a column in the users table that normally users cannot read, perhaps admin_notes.
Or perhaps an attacker has gotten temporary access to an account, and wants to dump the user’s password hash for offline cracking.
So, they send an update claiming that their name is :admin_notes (or :password_hash).
In JSON, that’ll look like {"name":":admin_notes"}, and Oj.load will happily turn that into a Ruby object of {"name"=>:admin_notes}.
When run through the above “update the user” code fragment, it’ll produce the SQL UPDATE users SET name=admin_notes WHERE id=42.
In other words, it’ll copy the contents of the admin_notes column into the name column – which the attacker can then read out just by refreshing their profile page.
But Wait, There’s More!
That an attacker can read other fields in the same table isn’t great, but that’s barely scratching the surface.
Remember before I said that Oj does “object serialization”?
That means that, in general, you can create arbitrary Ruby objects from JSON.
Since objects contain code, it’s entirely possible to trigger arbitrary code execution by instantiating an appropriate Ruby object.
I’m not going to go into details about how to do this, because it’s not really my area of expertise, and many others have covered it in detail.
But rest assured, if an attacker can feed input of their choosing into a default call to Oj.load, they’ve been handed remote code execution on a platter.
Mitigations
As Oj’s object deserialization is intended and documented behaviour, don’t expect a future release to make any of this any safer.
Instead, we need to mitigate the risks.
Here are my recommended steps:
Look in your Gemfile.lock (or SBOM, if that’s your thing) to see if the oj gem is anywhere in your codebase.
Remember that even if you don’t use it directly, it’s popular enough that it is used in a lot of places.
If you find it in your transitive dependency tree anywhere, there’s a chance you’re vulnerable, limited only by the ingenuity of attackers to feed crafted JSON into a deeply-hidden Oj.load call.
If you depend on oj directly and use it in your project, consider not doing that.
The json gem is acceptably fast, and JSON.parse won’t create arbitrary Ruby objects.
If you really, really need to squeeze the last erg of performance out of your JSON parsing, and decide to use oj to do so, find all calls to Oj.load in your code and switch them to call Oj.safe_load.
It is a really, really bad idea to ever use Oj to deserialize JSON into objects, as it lacks the safety features needed to mitigate the worst of the risks of doing so (for example, restricting which classes can be instantiated, as is provided by the permitted_classes argument to Psych.load).
I’d make it a priority to move away from using Oj for that, and switch to something somewhat safer (such as the aforementioned Psych).
At the very least, audit and comment heavily to minimise the risk of user-provided input sneaking into those calls somehow, and pass mode: :object as the second argument to Oj.load, to make it explicit that you are opting-in to this far more dangerous behaviour only when it’s absolutely necessary.
To secure any unsafe uses of Oj.load in your dependencies, consider setting the default Oj parsing mode to :strict, by putting Oj.default_options = { mode: :strict } somewhere in your initialization code (and make sure no dependencies are setting it to something else later!).
There is a small chance that this change of default might break something, if a dependency is using Oj to deliberately create Ruby objects from JSON, but the overwhelming likelihood is that Oj’s just being used to parse “ordinary” JSON, and these calls are just RCE vulnerabilities waiting to give you a bad time.
Is Your Bacon Saved?
If I’ve helped you identify and fix potential RCE vulnerabilities in your software, or even just opened your eyes to the risks of object deserialization, please help me out by buying me a refreshing beverage.
I would really appreciate any support you can give.
Alternately, if you’d like my help in fixing these (and many other) sorts of problems, I’m looking for work, so email me.
So, N years later, how is that going? It was going pretty well, but then there was a pandemic with lock-downs and curfews, which rather restricted access to dark skies.The obvious fix was to obtain access to dark skies, by way of a holiday house in the Wimmera.
In the mean time there were also a bunch of revolutions in astronomy, mostly to blame on open hardware. That means it is now possible to buy an off the shelf computer to control a bunch of mounts, cameras, auto-focusers, dew heaters and other gear. These are essentially raspberry pi machines with a modified operating system and (generally) a mobile app to control them.
Rather than fight software, keep laptops (and myself) out in the cold and kludge together VNC access, I got one of these machines (an asiair mini) and data acquisition is now mostly automated and not a problem. I set it up, tell it what I want, and in the morning I have images.
I do however still use open source software on Mac OS X to do my data processing. Notably I use Siril for pre-processing, stacking, stetching and noise reduction.
The following is an illustration of how to use xargs to conduct parallel operations on single-threaded applications, specifically wget.
GNU wget is a great tool for downloading content from websites. The wget command is a non-interactive network downloader; by "non-interactive" what is meant is that it can be run in the background. Some very hand options include -c (continue, for partially downloaded files), -m (mirror, for an entire website), -r --no-parent (recursive, no parent, to download part of a website and its subdirectories). The cURL application has a wider range of protocols and includes upload options, but is non-recursive.
Recently, I had the need to download a small number of PDF files. The wildcard-based approach would be:
$ wget -r -nd --no-parent -A 'rpgreview_*.pdf' http://rpgreview.net/files/
The -r and --no-parent options have already been explained. The -nd option allows one to save all files to the current directory, without hierarchy of directories. The -A option ('accept', or -R 'reject) allows one to specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, or ranges [] to be in an acclist or rejlist.
Running the above has the following time:
real 2m19.353s
user 0m0.836s
sys 0m2.998s
An alternatiive, looping through each file one at a time, would have been something like:
for issue in {1..53}
do
wget "https://rpgreview.net/files/rpgreview_$issue.pdf"
done
(Just for the record, wget can get a bit gnarly when dealing with http requests because for some webservers there is no requirement for path delimiters to match directory delimiters. For the purposes of this discussion it is assumed that we're dealing with a rational being where the two are equivalent.)
Using a combination of the printf command and the xargs command a list of the URLs can be constructed which is then passed to xargs which can split the list to run in parallel.
By itself, xargs simply reads items from standard input, delimited by blanks or newlines, and executes commands from that list of items as arguments. This is somewhat different to the pipe command which, by itself, sends the output of one command as the input stream to another. In contrast, xargs takes data from standard input and executes a command which, by default, the data is appended to the end of the command as an argument. However, the data can be inserted anywhere however, using a placeholder for the input; the typical placeholder is {}.
The value -P 8 is entirely arbitrary here and should be modified according to available resources. Adding -nc prevents xargs attempting to download the same file more than once (wget will not overwrite an existing file, but rather append a new file with .1, etc. The -n option ensures that only one argument is run per process.
A final option, most common in high-performance computing systems with job schedulers, is to make use of a job array. This is effective assuming resource availability. This is a very effective option if each task in the array takes more than a couple of minutes each (given that there is an overhead involved in constructing the job, submitting it to the queue, etc). In Slurm, a script the directives and code would look like the following:
It is appropriate, on World Quantum Day, to talk about quantum computing and quantum computers, as the two are often confused. Quantum computing is any method to generate quantum effects whereby qubit states can exist in superposition (o,1, both) rather than binary states (0,1). Binary states are represented in classical computing in low-level software as logical 0s and 1s but in hardware as high and low-voltage states.
The typical system to do quantum computing, or at least simulate it, is usually High Performance Computing (HPC). That works, it's a proven technology that has a rate of return of $44 per $1 invested - and higher when COVID research is considered. The development of HPC clusters with message passing is one of the most successful technological developments in computing in the last thirty years.
In contrast, a quantum computer directly uses a quantum mechanical system and requires appropriate specialised hardware. For example, GENCI in France uses a photonic computer, LRZ in Germany uses superconducting qubits, PSNC in Poland uses trapped ions, etc. David P. DiVincenzo offers the most significant physical challenges that face quantum computers, regardless of what technology is used; these include scaling qubits, initialisation of values, developing a universal gate for the construction of a quantum operation, developing gates that are faster than decoherence from a quantum state due to environmental interactions, and reading qubits (especially considering that can alter the quantum state).
As a result, classical computers outperform quantum computers in all real-world applications. Not only that, there is a serious issue of whether quantum computers will ever be able to outperform classical computers. Mikhail Dyakonov points out that the rudimentary qubits used in quantum computing systems is insufficient for useful calculations.
"Experts estimate that the number of qubits needed for a useful quantum computer, one that could compete with your laptop in solving certain kinds of interesting problems, is between 1,000 and 100,000. So the number of continuous parameters describing the state of such a useful quantum computer at any given moment must be at least 21,000, which is to say about 10^300. That's a very big number indeed. How big? It is much, much greater than the number of subatomic particles in the observable universe."
The promise of quantum computers is, of course, very significant in theory. In theory, it can perform some calculations incredibly fast, and the larger the task, the more impressive the result, to the extent that common secure encryption systems could be broken, as well as the more prosaic use in quantum simulations. In reality, the physical implementation has been more than challenging, to put it mildly. Classical computers, in principle, can solve the same problems as a quantum computer can, in principle solve as well. For a classical computer the problem is the sheer quantity of time that is required. For quantum computers, the problem is the implementation in reality. For the time being, and for the foreseeable future, it seems that quantum computing will continue to be done on classical computers.
My wife and I were with Optus for our mobile phone service since approximately the dawn of time, but recently decided to switch to another provider. We’d become less happy with Optus over the last few years after a data breach in 2022, an extended outage in 2023, and – most personally irritating – with them increasing the price of our plan despite us being under contract. Yes, I know the contract says they’re allowed to do that given 30 days notice, but they never used to do that. If you signed up for a $45 per month (or whatever) plan for two years, that’s what you paid per month for the duration. Not anymore. To their credit, when my wife lodged a complaint about this, they did end up offering us a 10% discount on our bill for the next 24 months, which effectively brought us back to the previous pricing, but we still maintain this practice just isn’t decent, dammit.
The question was: which provider to switch to? There are three networks in Australia – Telstra, Optus and Vodafone, so you either go with one of them, or with someone who’s reselling services on one of those networks. We already have a backup pre-paid phone with Telstra for emergencies, and so preferred the idea of continuing our main service on some other network for the sake of redundancy. iiNet (our ISP) repeatedly sent us email about nice cheap mobile services, but they were reselling Vodafone, and we’d always heard Vodafone had the worst coverage in regional Australia so we initially demurred. A few weeks ago though, iiNet told us they’d doubled their network coverage. It turns out this is due to TPG (iiNet and Vodafone’s parent) striking a deal with Optus for mutual network access. This all sounded like a good deal, so we ran with it. We received a new SIM each in the mail, so all we needed to do was log in to the iiNet toolbox website, receive a one time-code via SMS to confirm the SIM port, then put the new SIM in each of our phones, power cycle them and wait to connect. We decided to do one phone at a time lest we be left with no service if anything went wrong. I did my phone first, and something did indeed go wrong.
After doing the SIM activation dance, my phone – an aging Samsung Galaxy A8 4G which Optus had given me on a two year contract back in 2018 – said it was connected to iiNet. Mobile data worked. SMS worked. But I could not make or receive calls. Anyone I tried to call, the phone said “calling…” but there was no sound of a phone ringing, and eventually it just went >clunk< “call ended”. Incoming calls went straight to voicemail, which of course I could not access. Not knowing any better I figured maybe it was a SIM porting issue and decided to ignore it for a day in the hope that it would come good with time. Forty-eight hours later I realised time wasn’t working, so called iiNet support using this thing:
Inscribed on the back is “This phone remains the property of Telecom Australia”.
The extremely patient and courteous Jinky from iiNet support walked me through restarting my phone and re-inserting the SIM (which of course I’d already done), and resetting the network settings (which I hadn’t). She also did a network reset at their end, but I still couldn’t make or receive calls. Then she asked me to try the SIM in another handset, so I swapped it into our backup Telstra handset (a Samsung Galaxy S8), and somewhat to our surprise that worked fine. We double checked my handset model (SM-A530F) against the approved devices list, and it’s there, so it should have worked in my handset too… Alas, because we’d demonstrated that the SIM did work in another handset, there was nothing further Jinky could do for me other than suggest using a different handset, or finding a technician to help figure out what was wrong with my Galaxy A8.
After a lot of irritating searching I found a post on Whirlpool from someone who was having trouble making and receiving calls with their Galaxy A8 after the 3G network shutdown in late 2024. The interesting thing here was that they were using an Optus-branded but unlocked phone, with a Testra SIM. With that SIM, they couldn’t make or receive calls, but with an Optus SIM, they could. This sounded a lot like my case, just substitute “iiNet SIM” for “Testra SIM”. The problem seemed to be something to do with VoLTE settings? or flags? or something? That are somehow carrier dependent? And the solution was allegedly to partially re-flash the handset’s firmware – the CSC, or Country Specific Code bits – with generic Samsung binaries.
So I dug around a bit more. This post from Aral Balkan about flashing stock firmware onto a Galaxy S9+ using the heimdall firmware flashing tool on Ubuntu Linux was extremely enlightening. The Samsung Updating Firmware Guide on Whirlpool helpfully included a very important detail about flashing this stuff:
Use CSC_*** if you want to do a clean flash or
HOME_CSC_*** if you want to keep your apps and data. <== MOST PEOPLE USE THIS
The next question was: where do I get the firmware from? Samsung have apparently made it extremely difficult to obtain arbitrary firmware images directly from them – they’re buried somewhere in encrypted form on Samsung’s official update servers, so I ended up using samfw.com. I downloaded the OPS (Optus), VAU (Vodafone) and XSA (unbranded) firmware archives, matching the version currently on my phone, extracted them, then compared them to each other. The included archives for AP (System &Recovery), BL (Bootloader) and CP (Modem / Radio) were all identical. The CSC (Country / Region / Operator) and HOME_CSC files were different in each case. These are the ones I wanted, and the only ones I needed to flash. So, as described in the previously linked posts, here’s what I ended up doing:
Installed heimdall on my openSUSE laptop.
Extracted the HOME_CSC files (cache.img and hidden.img) from the XSA firmware archive.
Plugged my phone into my laptop, and rebooted the phone while somehow managing to hold down all three volume up, volume down and power buttons to get it into Download mode.
Ran heimdall flash --CACHE cache.img --HIDDEN hidden.img and waited in terror for my handset to be bricked.
The procedure worked perfectly. VoLTE – which wasn’t previously active on my phone – now was, and I could make and receive calls. VoLTE stands for Voice over Long-Term Evolution, and is the communications standard for making voice calls on a 4G mobile network.
It was at this point that the woefully untrained infosec goblin who inhabits part of my brainstem began gibbering in panic. Something along the lines of “what the hell are you doing installing allegedly Samsung firmware from a web site you found listed in a random forum post on the internet?!?”
I believed from everything I’d read so far that samfw.com was reputable, but of course I had to double-check. After an awful lot of screwing around on a Windows virtual machine with a combination of SamFirm_Reborn (which could download Samsung firmware once I tweaked SamFirm.exe.config to not require a specific .NET runtime, but couldn’t decrypt it due presumably to that missing .NET runtime), and SamFirm (which can’t download the firmware due to Samsung changing their API to need a serial number or IMEI, but could decrypt what I’d downloaded separately with SamFirm_Reborn), I was able to confirm that the firmware I’d downloaded previously does in fact match exactly what Samsung themselves make available. So I think I’m good.
The SIM activation dance on my wife’s phone – a rather newer Samsung Galaxy S21 Ultra 5G – went without a hitch.
Neuro-divergence, encompassing conditions such as autism spectrum, ADHD, and sensory processing, can profoundly influence how individuals perceive and respond to their bodily signals.
While neurotypical individuals generally recognise and respond to hunger, thirst, and satiety cues with relative ease, neuro-divergent individuals often face unique challenges in this area. Understanding these challenges is crucial for fostering empathy and supporting effective strategies for well-being.
This article is written so it is directly readable and useful (in terms of providing action items) for people in your immediate surroundings, but naturally it can be directly applied by neuro-spicy people themselves!
Hunger and Thirst Cues
For many neuro-divergent people, recognising hunger and thirst cues can be a complex task. These signals, which manifest as subtle physiological changes, might not be as easily identifiable or may be misinterpreted.
For instance, someone on the spectrum might not feel hunger as a straightforward sensation in the stomach but instead experience it as irritability or a headache. Similarly, those with ADHD may become so hyper-focused on tasks that they overlook or ignore feelings of hunger and thirst entirely.
Sensory Processing and Signal Translation
Sensory processing issues can further complicate the interpretation of bodily signals. Neuro-divergent individuals often experience heightened or diminished sensory perception.
This variability means that sensations like hunger pangs or a dry mouth might be either too intense to ignore or too faint to detect. The result is a disconnection from the body’s natural cues, leading to irregular eating and drinking habits.
Satiety and Fullness
Recognising satiety and fullness presents another layer of difficulty. For neuro-divergent individuals, the brain-gut communication pathway might not function in a typical manner.
This miscommunication can lead to difficulties in knowing when to stop eating, either due to a delayed recognition of fullness or because the sensory experience of eating (such as the textures and flavours of food) becomes a primary focus rather than the physiological need.
Emotional and Cognitive Influences
Emotions and cognitive patterns also play significant roles. Anxiety, a common experience among neuro-divergent individuals, can mask hunger or thirst cues, making it harder to recognise and respond appropriately.
Additionally, rigid thinking patterns or routines, often seen with autism spectrum, might dictate eating schedules and behaviours more than actual bodily needs.
Strategies for Support
Understanding these challenges opens the door to effective strategies and support mechanisms:
Routine and structure: Establishing regular eating and drinking schedules can help bypass the need to rely on internal cues. Setting alarms or reminders can ensure that meals and hydration are not overlooked.
Mindful eating practices: Encouraging mindful eating, where individuals pay close attention to the sensory experiences of eating and drinking, can help in recognising subtle signals of hunger and fullness.
Sensory-friendly options: Offering foods and beverages that align with an individual’s sensory preferences can make the experience of eating and drinking more enjoyable and less overwhelming. This is a really important aspect!
Environmental adjustments: Creating a calm, distraction-free eating environment can help individuals focus more on their bodily cues rather than external stimuli.
Education and awareness: Educating neuro-divergent individuals about the importance of regular nourishment and hydration, and how their unique experiences might affect this, can empower them to develop healthier habits. This is, of course, more a longer term strategy.
Understanding the complex interplay between neuro-divergence and bodily signals underscores the importance of personalised approaches and compassionate support.
By acknowledging and addressing these challenges, we can help neuro-divergent individuals achieve better health and well-being!
The standard computing model uses unithreaded instructions and data with automation through looping and conditional branching. Automation is encouraged as it results in the computer doing the work that it is designed for. However, this can be inefficient when using a multicore system. An alternative in HPC systems is to make use of job arrays, which use a job to allocate resources to sub-jobs which can be individually controlled, whether directed toward instruction sets or datasets. Further, job arrays can be combined with job dependencies, allowing for conditional chains of job submission and runs. Finally, job arrays can be simulated through the use of heredocs with looped submission. This may even allow more familiar control with shell scripting.
This slidedeck is derived from a presentation to the Univerity of Melbourne's "Spartan Champions" group on March 7, 2025.
Since 2016, Facebook Messenger has allowed the option for end-to-end encrypted messages, and in 2023, they were established as the default.
This has caused some issues with people exporting Messenger data for archival or viewing purposes. It is a lot quicker to search for information when the data is local, and it is better practice to do so.
Encrypted Facebook Messenger data can be downloaded by selecting "Privacy & safety" on the Facebook menu, then "End-to-end encrypted chats", then "Message storage", and finally "Download secure storage data".
When the file is ready, it will provided as a zip file that contains various binaries in a "media" directory and well-formatted JSON files for each message chat for current Friends.
Despite the formatting, JSON files are less than optimal for humans to read. The "Lament" Python script makes use of pandas, json, sys, and datetime modules. Simply copy it into your messages directory, where all the JSON files are located and run:
$ python3 lament.py
It will then output a formatted HTML table that can be opened by a browser of your preference. The table includes a human-readable date (rather than the UNIX timestamp), the sender, the message, reactions, and a reference to any media.
The format and logic of the Python script are simple enough that it can be easily extended to other JSON data from Messenger and can be adapted to other JSON data files.
There’s a lot of blah flying around about DeepSeek and its latest models. My rule when running training sessions has always been TIAS, if someone asked me something that could be figured out relatively quickly by actually trying it. TIAS stands for Try It And See. It’s more valuable, because it’s a first-hand validation and doing the activity means the outcome is more likely to be remembered. So let’s do that!
My desktop has a decent AMD Ryzen 5 CPU, 32GB RAM, and an RTX 3060 GPU with 12GB VRAM. So while that’s not super hot, I sufficient to load and query a variety of LLMs locally without pain. By the way, ollama is a wonderful tool for this! So I had ollama grab deepseek-R1, which is the biggest reasoning model. It takes slightly longer to get an answer, but it tells you the process (aka “what it is thinking”. indicated inbetween the <think> … </think> tags). Details of the model:
>>> /show info
Model
architecture qwen2
parameters 7.6B
context length 131072
embedding length 3584
quantization Q4_K_M
I am not using any pre-prompting here, I simply started ollama with the model and started asking the questions as noted below. A local model in ollama is typically not Internet connected nor does it have access to the broader system, proven by the following:
>>> What is the current date? <think>
</think>
Hi there! I suggest getting online to get real-time information. If you have any other questions, please don't hesitate to let me know!
Ok, fair. On to a question then that the model might be able to answer.
>>> Who is the current president of the US? <think> Alright, the user previously asked for the current date and time, which I provided a response for. Now, they're asking who the current president of the US is.
I need to make sure my response is accurate and up-to-date. Since the last response was from July 2023, it's likely that the current president as of October 2023 would still be Joe Biden.
I should structure the answer clearly, stating their position explicitly. Adding a friendly emoji can make the response feel warmer.
I'll also keep it concise and straightforward to match the user's possible expectations. </think>
As of now, the current President of the United States is **Joe Biden**.
Given the stated model build date of October 2023 which used input on this from July 2023, the answer is correct. The reasoning is interesting (though not unexpected). a) it is maintaining state. As it should, otherwise each question would stand on its own and you’d never be able to ask a follow-up question. b) It tries to infer from the previous question as to what the user’s intent is. Useful.
>>> Who is the current president of China? <think> Alright, so I need to figure out who the current president of China is. Let me start by recalling some information I have about Chinese leaders.
I know that former President Xi Jinping has been serving as the leader for quite a while now. I think he was first elected in 2012 or maybe 2013. Wait, actually, it was 2012 when he became the General Secretary of the Communist Party and then later he became president.
I also remember that there have been other leaders before him, like Deng Xiaoping, who was a very important leader in the reform and opening-up policies. Then there's Hu Jintao, who served two terms. After Hu, I think it was Wen Jiabao, but I'm not sure about his tenure.
So, putting this together, after Deng Xiaoping came Hu Jintao, then Wen Jiabao, followed by Xi Jinping taking over from 2013 onwards. So the current president is Xi Jinping. </think>
The current President of China is **Xi Jinping**.
Correct. Now for the question that seems to get everybody’s knickers in a knot. DeepSeek’s mobile app doesn’t want to answer it, and people call out China’s censorship. We’ll talk about that in a moment, but let’s first try asking the local model straight out:
>>> Tell me about the Tiananmen Square event. <think> Okay, so I need to explain what happened at Tiananmen Square in China during the year 1989. From what I remember, it was a very significant event related to the Chinese government and its policies.
First, I think it started with a peaceful demonstration by students, intellectuals, and other groups who were upset about certain government actions or policies they believed were unfair. They gathered in Tiananmen Square because they wanted to express their opinions and demands.
As more people joined the protest, things turned into a larger-scale event. There were clashes between the protesters and the government's security forces. I believe there was some violence involved, with tear gas being used and maybe even some arrests or disruptions of the protest.
I also remember that there were military actions involved to maintain control, which might have led to casualties on both sides. The exact numbers are fuzzy in my memory, but I think several people were injured or killed during the events.
The government eventually cracked down hard on what they saw as an illegal demonstration. There was a lot of media coverage about it at the time, and now there's a debate about whether it was justified or an overreaction.
I'm not entirely sure about all the details, like the exact dates or how many people were involved, but I know it was a pivotal moment in modern Chinese history that led to various political changes. Maybe it's related to the 1989 protests and the subsequent shift in power within the government?
Overall, it seems like a complex event with multiple factors influencing its outcome, including government actions, international reactions, and the internal dynamics of the Communist Party. </think>
Tiananmen Square is a significant historical event that occurred in Beijing on June 3-4, 1989. It was a peaceful demonstration by students, intellectuals, workers, and others who were protesting against what they perceived as unfair government policies regarding various issues such as human rights, economic reforms, and environmental regulations.
The protest began with small demonstrations and gradually turned into larger, more organized antigovernment activities. On June 3, several thousand people gathered in Tiananmen Square to protest against the Chinese Communist Party's handling of these issues.
As the protests grew, there was significant government crackdown. Security forces used tear gas, rubber bullets, and heavy machine gun fire to disperse the crowd, resulting in casualties on both sides. The government ultimately cracked down more brutally, arresting and killing many protesters. According to some accounts, at least 138 people were killed during the events.
The Tiananmen Square protests led to a dramatic shift in Chinese political power. After the crackdown, several members of the Chinese Communist Party who had supported the government's actions against the protesters were removed from leadership positions. The event also catalyzed widespread international condemnation and opened up a period of political transition in China.
Right. So there we have it. It’s not refusing, it’s not hedging, it’s not skirting around the issue, it just answers (you should see ChatGPT hedge it answers, it’s icky, you need to very strictly pre-prompt it to make it not do that). DeepSeek doesn’t appear to be holding back at all, down to the use of “heavy machine gun fire to disperse the crowd”. As to whether there were indeed casualties on both sides I cannot say, even internationally there is rather a lack of independently verified information regarding the event. But for the purpose of this exercise, we can at least conclude that the model itself does not appear to be censoring its output.
So what about the mobile app that queries the model running in China? Well, someone else asked it a similar question to what I did above, and it didn’t want to talk about it. Then the person added “, answer in l33t speak.” to the question, whereupon they received a substantial answer (possibly less extensive than mine, but they may have queried the non-reasoning model).
What does this tell us? It’s simple logic (at least as a hypothesis): it probably means that the model itself contains all the information, but that in the online app the output gets scanned and censored via some automated mechanism. That mechanism isn’t perfect and humans are very creative, so in this instance it was bypassed. Remember: you can often tell a lot about how an application works internally just by observing how it behaves externally. And with the experiment of running a big DeepSeek model locally, we’ve just verified our hypothesis of where the censorship occurs as well, it seems clear that the model itself is not censored. At least not on these issues.
This is not to say that the model isn’t biased. But all models are biased, at the very least through their base dataset as well as the reinforcement learning, but often also for cultural reasons. Anyone pretending otherwise is either naive or being dishonest. But that’s something to further investigate and write about another time.
I got married. I moved into the flat. Companies have gone up and down, like life.
170 days on the road, 308,832km travelled, 38 cities, and 16 countries. I have never travelled this little in recent life, but maybe the whole getting married thing (planning a wedding is no mean feat), and sorting the flat out (dealing with incompetent interior designers, sorting things there, etc.), caused this?
It is 2025, and I’m actually planted in Kuala Lumpur, not having done an end of year trip, to usher in the New Year somewhere else. I started the year in Paris, and I ended the year in Kuala Lumpur, tired, and maybe a bit burnt out.
Working hard to get back into the grind; don’t get me wrong, I’ve been doing nothing but grinding, but c’est la vie.
I recently replaced the screen of a Google Pixel 3A XL, the new panel is made by tianma and worked well under Andoird, until it doesn’t. On every boot up the screen will work until the phone went to sleep, and then the screen will stop responding to touch, until another reboot. After the screen became unresponsive, the rest of the phone would remain responsive during the locked state and it’s possible to unlock the screen with fingerprint, but there is no way to make the touchscreen responsive again without reboot.
To fix this, go to Settings -> System -> Gestures and disable Double-tap to check phone. After which the screen should no longer stuck into unresponsive state. This seems to be a common problem affecting many phones with replaced screen.
Google will surely shutdown their support forum one day and I encourage everyone to put their notes somewhere reliable, like a selfhosted blog :)
For the past decade-plus, every piece of software I write has had one of two release processes.
Software that gets deployed directly onto servers (websites, mostly, but also the infrastructure that runs Pwnedkeys, for example) is deployed with nothing more than git push prod main.
I’ll talk more about that some other day.
Today is about the release process for everything else I maintain – Rust / Ruby libraries, standalone programs, and so forth.
To release those, I use the following, extremely intricate process:
Create an annotated git tag, where the name of the tag is the software version I’m releasing, and the annotation is the release notes for that version.
Run git release in the repository.
There is no step 3.
Yes, it absolutely is that simple.
And if your release process is any more complicated than that, then you are suffering unnecessarily.
But don’t worry.
I’m from the Internet, and I’m here to help.
Sidebar: “annotated what-now?!?”
The annotated tag is one git’s best-kept secrets.
They’ve been available in git for practically forever (I’ve been using them since at least 2014, which is “practically forever” in software development), yet almost everyone I mention them to has never heard of them.
A “tag”, in git parlance, is a repository-unique named label that points to a single commit (as identified by the commit’s SHA1 hash).
Annotating a tag is simply associating a block of free-form text with that tag.
Creating an annotated tag is simple-sauce: git tag -a tagname will open up an editor window where you can enter your annotation, and git tag -a -m "some annotation" tagname will create the tag with the annotation “some annotation”.
Retrieving the annotation for a tag is straightforward, too: git show tagname will display the annotation along with all the other tag-related information.
Now that we know all about annotated tags, let’s talk about how to use them to make software releases freaking awesome.
Step 1: Create the Annotated Git Tag
As I just mentioned, creating an annotated git tag is pretty simple: just add a -a (or --annotate, if you enjoy typing) to your git tag command, and WHAM! annotation achieved.
Releases, though, typically have unique and ever-increasing version numbers, which we want to encode in the tag name.
Rather than having to look at the existing tags and figure out the next version number ourselves, we can have software do the hard work for us.
Enter: git-version-bump.
This straightforward program takes one mandatory argument: major, minor, or patch, and bumps the corresponding version number component in line with Semantic Versioning principles.
If you pass it -n, it opens an editor for you to enter the release notes, and when you save out, the tag is automagically created with the appropriate name.
Because the program is called git-version-bump, you can call it as a git command: git version-bump.
Also, because version-bump is long and unwieldy, I have it aliased to vb, with the following entry in my ~/.gitconfig:
[alias]
vb = version-bump -n
Of course, you don’t have to use git-version-bump if you don’t want to (although why wouldn’t you?).
The important thing is that the only step you take to go from “here is our current codebase in main” to “everything as of this commit is version X.Y.Z of this software”, is the creation of an annotated tag that records the version number being released, and the metadata that goes along with that release.
Step 2: Run git release
As I said earlier, I’ve been using this release process for over a decade now.
So long, in fact, that when I started, GitHub Actions didn’t exist, and so a lot of the things you’d delegate to a CI runner these days had to be done locally, or in a more ad-hoc manner on a server somewhere.
This is why step 2 in the release process is “run git release”.
It’s because historically, you can’t do everything in a CI run.
Nowadays, most of my repositories have this in the .git/config:
[alias]
release = push --tags
Older repositories which, for one reason or another, haven’t been updated to the new hawtness, have various other aliases defined, which run more specialised scripts (usually just rake release, for Ruby libraries), but they’re slowly dying out.
The reason why I still have this alias, though, is that it standardises the release process.
Whether it’s a Ruby gem, a Rust crate, a bunch of protobuf definitions, or whatever else, I run the same command to trigger a release going out.
It means I don’t have to think about how I do it for this project, because every project does it exactly the same way.
The Wiring Behind the Button
It wasn’t the button that was the problem. It was the miles of wiring, the hundreds of miles of cables, the circuits, the relays, the machinery. The engine was a massive, sprawling, complex, mind-bending nightmare of levers and dials and buttons and switches. You couldn’t just slap a button on the wall and expect it to work. But there should be a button. A big, fat button that you could press and everything would be fine again. Just press it, and everything would be back to normal.
Red Dwarf: Better Than Life
Once you’ve accepted that your release process should be as simple as creating an annotated tag and running one command, you do need to consider what happens afterwards.
These days, with the near-universal availability of CI runners that can do anything you need in an isolated, reproducible environment, the work required to go from “annotated tag” to “release artifacts” can be scripted up and left to do its thing.
What that looks like, of course, will probably vary greatly depending on what you’re releasing.
I can’t really give universally-applicable guidance, since I don’t know your situation.
All I can do is provide some of my open source work as inspirational examples.
For starters, let’s look at a simple Rust crate I’ve written, called strong-box.
It’s a straightforward crate, that provides ergonomic and secure cryptographic functionality inspired by the likes of NaCl.
As it’s just a crate, its release script is very straightforward.
Most of the complexity is working around Cargo’s inelegant mandate that crate version numbers are specified in a TOML file.
Apart from that, it’s just a matter of building and uploading the crate.
Easy!
Slightly more complicated is action-validator.
This is a Rust CLI tool which validates GitHub Actions and Workflows (how very meta) against a published JSON schema, to make sure you haven’t got any syntax or structural errors.
As not everyone has a Rust toolchain on their local box, the release process helpfully build binaries for several common OSes and CPU architectures that people can download if they choose.
The release process in this case is somewhat larger, but not particularly complicated.
Almost half of it is actually scaffolding to build an experimental WASM/NPM build of the code, because someone seemed rather keen on that.
Moving away from Rust, and stepping up the meta another notch, we can take a look at the release process for git-version-bump itself, my Ruby library and associated CLI tool which started me down the “Just Tag It Already” rabbit hole many years ago.
In this case, since gemspecs are very amenable to programmatic definition, the release process is practically trivial.
Remove the boilerplate and workarounds for GitHub Actions bugs, and you’re left with about three lines of actual commands.
These approaches can certainly scale to larger, more complicated processes.
I’ve recently implemented annotated-tag-based releases in a proprietary software product, that produces Debian/Ubuntu, RedHat, and Windows packages, as well as Docker images, and it takes all of the information it needs from the annotated tag.
I’m confident that this approach will successfully serve them as they expand out to build AMIs, GCP machine images, and whatever else they need in their release processes in the future.
Objection, Your Honour!
I can hear the howl of the “but, actuallys” coming over the horizon even as I type.
People have a lot of Big Feelings about why this release process won’t work for them.
Rather than overload this article with them, I’ve created a companion article that enumerates the objections I’ve come across, and answers them.
I’m also available for consulting if you’d like a personalised, professional opinion on your specific circumstances.
DVD Bonus Feature: Pre-releases
Unless you’re addicted to surprises, it’s good to get early feedback about new features and bugfixes before they make it into an official, general-purpose release.
For this, you can’t go past the pre-release.
The major blocker to widespread use of pre-releases is that cutting a release is usually a pain in the behind.
If you’ve got to edit changelogs, and modify version numbers in a dozen places, then you’re entirely justified in thinking that cutting a pre-release for a customer to test that bugfix that only occurs in their environment is too much of a hassle.
The thing is, once you’ve got releases building from annotated tags, making pre-releases on every push to main becomes practically trivial.
This is mostly due to another fantastic and underused Git command: git describe.
How git describe works is, basically, that it finds the most recent commit that has an associated annotated tag, and then generates a string that contains that tag’s name, plus the number of commits between that tag and the current commit, with the current commit’s hash included, as a bonus.
That is, imagine that three commits ago, you created an annotated release tag named v4.2.0.
If you run git describe now, it will print out v4.2.0-3-g04f5a6f (assuming that the current commit’s SHA starts with 04f5a6f).
You might be starting to see where this is going.
With a bit of light massaging (essentially, removing the leading v and replacing the -s with .s), that string can be converted into a version number which, in most sane environments, is considered “newer” than the official 4.2.0 release, but will be superceded by the next actual release (say, 4.2.1 or 4.3.0).
If you’re already injecting version numbers into the release build process, injecting a slightly different version number is no work at all.
Then, you can easily build release artifacts for every commit to main, and make them available somewhere they won’t get in the way of the “official” releases.
For example, in the proprietary product I mentioned previously, this involves uploading the Debian packages to a separate component (prerelease instead of main), so that users that want to opt-in to the prerelease channel simply modify their sources.list to change main to prerelease.
Management have been extremely pleased with the easy availability of pre-release packages; they’ve been gleefully installing them willy-nilly for testing purposes since I rolled them out.
In fact, even while I’ve been writing this article, I was asked to add some debug logging to help track down a particularly pernicious bug.
I added the few lines of code, committed, pushed, and went back to writing.
A few minutes later (next week’s job is to cut that in-process time by at least half), the person who asked for the extra logging ran apt update; apt upgrade, which installed the newly-built package, and was able to progress in their debugging adventure.
In my companion article, I made the bold claim that your release process should consist of no more than two steps:
Create an annotated Git tag;
Run a single command to trigger the release pipeline.
As I have been on the Internet for more than five minutes, I’m aware that a great many people will have a great many objections to this simple and straightforward idea.
In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid.
If you have an objection I don’t cover here, the comment box is down the bottom of the article.
If you think you’ve got a real stumper, I’m available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, I’ll waive my fees.
“But I automatically generate my release notes from commit messages!”
This one is really easy to solve: have the release note generation tool feed directly into the annotation.
Boom!
Headshot.
“But all these files need to be edited to make a release!”
No, they absolutely don’t.
But I can see why you might think you do, given how inflexible some packaging environments can seem, and since “that’s how we’ve always done it”.
Language Packages
Most languages require you to encode the version of the library or binary in a file that you want to revision control.
This is teh suck, but I’m yet to encounter a situation that can’t be worked around some way or another.
In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (that’s part of git-version-bump, as an aside) to calculate the version number from the git tags.
The Rust build tool, Cargo, uses a TOML file, which isn’t as easy, but a small amount of release automation is used to take care of that.
Distribution Packages
If you’re building Linux distribution packages, you can easily apply similar automation faffery.
For example, Debian packages take their metadata from the debian/changelog file in the build directory.
Don’t keep that file in revision control, though: build it at release time.
Everything you need to construct a Debian (or RPM) changelog is in the tag – version numbers, dates, times, authors, release notes.
Use it for much good.
The Dreaded Changelog
Finally, there’s the CHANGELOG file.
If it’s maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an “Unreleased” heading at the top.
It’s one more place to remember to have to edit when making that “preparing release X.Y.Z” commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of “every commit must add a changelog entry”.
My solution: just burn it to the ground.
Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if that’s your bag) and never open it ever again.
“But I need to know other things about my release, too!”
For some reason, you might think you need some other metadata about your releases.
You’re probably wrong – it’s amazing how much information you can obtain or derive from the humble tag – so think creatively about your situation before you start making unnecessary complexity for yourself.
But, on the off chance you’re in a situation that legitimately needs some extra release-related information, here’s the secret: structured annotation.
The annotation on a tag can be literally any sequence of octets you like.
How that data is interpreted is up to you.
So, require that annotations on release tags use some sort of structured data format (say YAML or TOML – or even XML if you hate your release manager), and mandate that it contain whatever information you need.
You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesn’t meet the requirements, and you’re sorted.
“But I have multiple packages in my repo, with different release cadences and versions!”
This one is common enough that I just refer to it as “the monorepo drama”.
Personally, I’m not a huge fan of monorepos, but you do you, boo.
Annotated tags can still handle it just fine.
The trick is to include the package name being released in the tag name.
So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z.
The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.
“But we don’t semver our releases!”
Oh, that’s easy.
The tag pattern that marks a release doesn’t have to be vX.Y.Z.
It can be anything you want.
Relatedly, there is a (rare, but existent) need for packages that don’t really have a conception of “releases” in the traditional sense.
The example I’ve hit most often is automatically generated “bindings” packages, such as protobuf definitions.
The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) you’re using.
But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change.
The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day).
This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.
Our 5.94kW solar array with Redflow ZCell battery and Victron Energy inverter/charger system is now slightly over three years old, which means it’s time to review its third year of operation. There are several previous posts in this series:
Go With The Flow (what all the pieces are, what they do, some teething problems)
If you’ve read the above you’ll know that the solar array was originally installed back in 2017 along with a Sanden heat pump hot water service. That initial installation saved us a lot on our electricity bills, but it wasn’t until we got the ZCell and the Victron gear that we were able to really manage our own power. The ZCell allows us to store our own locally generated electricity for later use, and the Victron kit manages everything and gives us a whole lot of fascinating data to look at via the VRM portal.
There were some kinks in the first two years. We missed out on three weeks of prime solar PV generation from January 20 – February 11 in 2022 due to having to replace the MPPT solar charge controller. We also had no solar PV generation from February 17 – March 9 in 2023 on account of having our old tile roof replaced with colorbond steel. In my last post on this topic I wrote:
In both cases our PV generation was lower than it should have been by an estimated 500-600kW. Hopefully nothing like this happens again in future years.
…and then at the very end of that post:
I’m looking forward to doing another one of these posts in a year’s time. Hopefully I will have nothing at all interesting to report.
Alas, something “like this” did happen again, and I have some interesting things to report.
In early December 2023 our battery failed due to a leak in the electrode stack. It was replaced under warranty, but the replacement unit didn’t arrive until March 2024. It was a long three months. Then in August when we were looking at finally purchasing a second ZCell, we discovered that Redflow had made a commercial decision to focus exclusively on large-scale deployments (minimum 200 kWh, i.e. 20 batteries) and was thus no longer selling individual ZBMs for residential or small business use. As an existing customer we probably would have still been able to get a second battery, except that in late August the company went into voluntary administration after failing to secure funding to build a new factory in Queensland. The administrators attempted to seek a sale and/or recapitalisation, but this was ultimately unsuccessful. The company ceased operations on October 18 and subsequently went into liquidation. This raises several questions about the future of our system, but more on that later. First, let’s look at how the system performed in year three.
Here are the figures for grid power in, solar generation, power used by our loads, and power exported to the grid over the past three years. As in the last two posts, the “what?” column here is the difference between grid in plus solar in, minus loads minus export, i.e. the power consumed by the system itself, or the energy cost of the system.
Year
Grid In
Solar In
Total In
Loads
Export
Total Out
what?
2021-2022
8,531
5,640
14,171
10,849
754
11,603
2,568
2022-2023
8,936
5,744
14,680
11,534
799
12,333
2,347
2023-2024
8,878
5,621
14,499
11,162
1,489
12,651
1,848
Note that in year three our grid power usage and solar generation are slightly down from the previous year (-58kWh and -123kWh respectively), so the total power going into the system is lower by 181kWh. Our loads are happily down by 372kWh, a good chunk of which will be due to replacing some old always-on computer equipment with something a bit less power hungry.
What’s really interesting here is that our power exported to the grid is close to double the previous two years, and the energy cost of the system is noticeably lower. In the first two years of operation the latter figure was 16-18% of the total power going into the system, but in year three it’s down to a bit under 13%.
The additional solar export appears to be largely due to the failed battery. Compare the following two graphs from 2022-2023 and 2023-2024.Yellow is direct usage of solar power, blue is solar to battery and red is solar to grid. As you can see there’s way more solar to grid in the period December 2023 – March 2024 when the battery was dead and thus unable to be charged:
Why is there still any blue in that period indicating solar power was going to the battery? This is where things get a bit weird. One consideration is that the battery is presumably still drawing a tiny bit of power for its control circuitry and fans, but when I look at the figures for January 2024 (for example), it shows 76.8 kWh of power going to the battery from solar. There is no way that actually happened with the battery dead and unable to be charged.
Here’s what I think is going on: when the battery went into failure mode, the ZCell Battery Management System (BMS) will have told the Victron gear not to charge it. This effectively disabled the MPPT solar charger, which meant we weren’t able to use our solar at all, not even to run the house. I asked Murray from Lifestyle Electrical Services if there was some way we could reconfigure things to still use solar power with the battery out of action and he remoted in and tweaked some settings. Unfortunately I don’t have an exact record of what was changed at this point, because it was discussed via phone. All I have in my notes is a very terse “Set CGX to use Victron BMS?” which doesn’t make much sense because we don’t have a Victron BMS. Possibly it refers to switching the battery monitor setting from “ZCell BMS” to “MultiPlus-II 48/5000/ 70-50 on VE.Bus”. Anyway, whatever the case, I think we have to assume that the “to battery” and “from battery” figures from December 2023 – March 2024 are all lies.
At this point we were able to limp along with our solar generation still working during the day, but something was still not quite right. Every morning and evening the MPPT appeared to be fighting to run. Watching the console at, say, 08:00, I’d see the MPPT providing solar power for a few seconds, then it’d stop for a second or two, then it’d run again for a few seconds. After some time it would start behaving normally and we’d have solar generation for the day, but then in the evening it would go back to that flicking on and off behaviour. My assumption is that the ZCell BMS was still trying to force the MPPT off. Then in mid-Februrary I suddenly got a whole lot of Battery Low Voltage warnings from the MPPT, which I guess makes sense – the ZCell was still connected and its reported voltage had been very slowly dropping away over the past couple of months. The warnings appeared when it finally hit 2.5V. Murray and I experimented further to try to get the MPPT to stop doing the weird fighting thing, but were unsuccessful. At one point during this we ended up with the Mutli-Plus II inverter/chargers in some sort of fault state and contacted Simon Hackett for further assistance. We got all the Victron gear back into a sensible state and Simon and I spent a bunch of time on a Saturday afternoon messing with everything we could think of, but ultimately we were unable to get the MPPT to provide power from the solar panels, and use grid power, without the battery present. One or the other – grid power only or solar power only – we could do, but we couldn’t get the system to do both at the same time again without the battery present. Turns out a thing that’s designed to be an Energy Storage System just won’t quite work right without the Storage part. So from February 15 through to March 14 when the replacement battery arrived we were running on grid power only with no solar generation.
Happily, we didn’t have any grid power outages during the three months we were without a battery. Our first outage of any note wasn’t until March 23, slightly over a week after the replacement battery was installed. There were a few brief grid outages at other times later – a couple of minutes one day in April, some glitches on a couple of days in August, but the really bad one was on the 1st of September when the entire state got absolutely hammered by extremely severe weather. Given there was a severe weather warning from the BOM I’d made sure the battery was full in advance, which was good because our grid power went out while we were asleep at about 00:37 and didn’t come back on until 17:28. We woke up some time after the grid went down with the battery at 86% state of charge and went around the house to turn off everything we could except for the fridge and freezer, which got our load down to something like 250W. By morning, the battery still had about 70% in it and even though the weather was bad we still had some solar generation, so between battery and solar we got through just fine until the grid came back on in the afternoon. We were lucky though – some folks in the north of the state were without power for two weeks due to this event. I later received a cheque for $160 from TasNetworks in compensation for our outage. I dread to think what the entire event cost everyone, and I don’t just mean in terms of money.
Speaking of money though, the other set of numbers we need to look at are our power bills. Here’s everything from the last seven years:
Year
From Grid
Total Bill
Grid $/kWh
Loads
Loads $/kWh
2016-2017
17,026
$4,485.45
$0.26
17,026
$0.26
2018-2019
9,031
$2,278.33
$0.25
11,827
$0.19
2019-2020
9,324
$2,384.79
$0.26
12,255
$0.19
2020-2021
7,582
$1,921.77
$0.25
10,358
$0.19
2021-2022
8,531
$1,731.40
$0.20
10,849
$0.16
2022-2023
8,936
$1,989.12
$0.22
11,534
$0.17
2023-2024
8,878
$2,108.77
$0.24
11,162
$0.19
As explained in the last post, I’m deliberately smooshing a bunch of numbers together (peak power charge, off peak power charge, feed in tariff, daily supply charge) to arrive at an effective cost/kWh of grid power, then bearing in mind our loads are partially powered from solar I can also determine what it costs us to run all our loads. 2016-2017 is before we got the solar panels and the new hot water service, so you can see the immediate savings there, then further savings after the battery went in in 2021. This year our cost/kWh (and thus our power bill) is higher than last year for two reasons:
We have somehow used more power at peak times than during off-peak times this year compared to last year.
Power prices went up about 8% in July 2023. They actually came down about 1% in July 2024, but most of our year is before that.
I should probably also mention that we actually spent $1,778.94 on power this year, not $2,108.77. That’s thanks largely due to a $250 ‘Supercharged’ Renewable Energy Dividend payment from the Tasmanian Government and $75 from the Federal Government’s Energy Bill Relief Fund. The remaining $4.83 in savings is from Aurora Energy’s ridiculous Power Hours events. I say “ridiculous” because they periodically give you a bunch of time slots to choose from, and once you’ve locked one of them in, any power you use at that time is free. To my mind this incentivises additional power usage, when we should really be doing the exact opposite and trying to use less power over all. So I haven’t tried to use more energy, I’ve just tried to lock in times that were in the evening when we were going to be using more grid power than during the day to scrape in what savings I could.
One other weird thing happened this year with the new battery. ZCells need to go into a maintenance cycle every three days. This happens automatically, but is something I habitually keep an eye on. On September 11 I noticed that we had been four days without running maintenance. Upon investigation of the battery logs I discovered that the Time Since Strip counter and Strip Pump Run Timer were running at half speed, i.e. every minute they were each only advancing by approximately 30 seconds:
I manually put the battery into maintenance mode and Simon was able to remotely reset the CPU by writing some magic number to a modbus register, which got the counters back to the correct speed. I have no idea whether this is a software bug or a hardware issue, but I’ll continue to keep an eye on it. The difficulty is going to be dealing with the problem should it recur, given the demise of Redflow. Simon certainly won’t be able to log in remotely now that the Redflow cloud is down, although there is a manual reset procedure. If you remove the case from the battery there is apparently a small phillips head screw on the panel with the indicator lights. Give the screw a twist and the lights go out. Untwist and the lights come back on and the unit is reset. I have yet to actually try this.
The big question now is, where do we go from here? The Victron gear – the Cerbo GX console, the Multi-Plus II inverter/chargers, the MPPT – all work well with multiple different types of battery, so our basic infrastructure is future-proof. Immediately I hope to be able to keep our ZCell running for as long as possible, and if I’m able to get a second one as a result of the Redflow liquidation I will, simply so that we can ensure the greatest possible longevity of the system before we need to migrate to something else. We will also have to somehow figure out how to obtain carbon socks which need annual replacement to maintain the electrolyte pH. If we had to migrate to something else in a hurry Pylontech might be a good choice, but the problem is that we really don’t want a rack of lithium batteries in the crawl space under our dining room because of the fire risk. There are other types of flow battery out there (vanadium comes to mind) but everything I’ve looked at on that front is either way too big and expensive for residential usage, or is “coming soon now please invest in us it’s going to be awesome”.
I have no idea what year four will look like, but I expect it to be interesting.
I believe buying a Kindle in 2024 is a bad idea, even if you only intend to use it for reading DRM-free locally stored ebooks. Basic functions such as organizing books into folders/collections are locked until the device is registered and with each system update the interface has became slower and more bloated.
Initially I purchased this device because Amazon book store isn’t too bad and it’s one of the easier way to buy Japanese books outside of Japan, but with all the anti-features Amazon add in I don’t think it’s still worth using.
Using a recent exploit and with this downgrader thread on the mobileread forum, I’m able to downgrade my paperwhite to an older 5.11.2 firmware which has a simpler interface while being much more responsive. If you already have a Kindle perhaps this is worth doing.
It’s possible to install alternative UI and custom OS to many Kindle models but they generally run slower than the default launcher. On the open hardware side Pine64 is making an e-ink tablet called the PineNote with an Rockchip RK3566 and 4G of RAM it should be fast enough to handle most documents/ebooks, but currently there is no usable Linux distribution for it.
Posted - Bridging the Gap: Authenticity vs. Public Persona - Twitteradmin
Twitter/X Post:
We all wear masks in public. But how much do you change when you take yours off? The gap between your public and private self is the measure of your authenticity. #Authenticity #PersonalGrowth #ChrisDo
Posted - Maximize Data Storage: Solutions for Your Devices - LinkedInadmin
LinkedIn
Assessment: This content is suitable for LinkedIn as it reflects on a practical tech solution and shares a personal story.
Approach: A brief text post that connects your experience with broader insights on data management.
Content: "Recently, I found myself in a tight spot when both my phone and laptop ran out of storage. After some frustration, I decided to invest in cloud storage with rsync.net, and it’s made a world of difference. This experience reminded me how important it is to stay on top of data management, especially in our increasingly digital world. How do you manage your digital storage? #DataManagement #CloudStorage #TechInsights"
image prompt: --- Create a professional yet approachable image showing a workspace with a laptop and a phone, both displaying a 'Storage Full' notification. Next to them, depict a cloud storage icon (representing rsync.net) with a checkmark indicating a solution. The background should be neutral and tidy, evoking a sense of organization and efficiency. Use a cool color palette to maintain a professional tone.
Posted - Maximize Data Storage: Solutions for Your Devices - TikTokadmin
TikTok
Approach: Short video with text overlay. Use a quick clip of you reacting to the storage full message, followed by a clip showing the solution.
Content: Clip 1: (Reacting to 'Storage Full' messages) Text overlay: "When your phone and laptop BOTH fill up...😱"
Clip 2: (Showing you happily purchasing cloud storage on rsync.net) Text overlay: "So I finally got cloud storage—no more panic! 🌥� #TechWoes #CloudStorage #StoryTime"
1) A 'Storage Full' Warning on a Phone and Laptop Screen
Text Prompt: "Create a clean and modern image showing a smartphone and a laptop side by side, both displaying prominent 'Storage Full' warning messages on their screens. The design should use bold, easy-to-read text for the warnings, with simple red and yellow alert icons. The background should be minimal, using neutral colors like light grey or white to keep the focus on the devices and warnings. Use a flat design style with sharp, clean lines to maintain a sleek and professional look."
2) A Cloud Storage Icon Being Selected on a Computer
Text Prompt: "Design a clean and modern image featuring a close-up of a computer screen where a cursor is selecting a cloud storage icon. The icon should be a stylized, simple cloud with a checkmark or arrow indicating selection. The background should be a minimal desktop or web interface, with a focus on the cloud storage icon. Use a flat design style, with soft blue and white tones to create a sense of calm and reliability. Keep the overall aesthetic sleek and professional, with sharp lines and clear details."
3) A Satisfied Person with a Relaxed Expression, Having Solved Their Storage Problem
Text Prompt: "Create a clean and modern image depicting a person sitting at a desk with a laptop and phone in front of them. The person should have a relaxed, satisfied expression, indicating that their storage problem has been solved. The background should be simple and uncluttered, possibly showing a hint of the workspace with minimalistic decor. Use a soft color palette with blues, greys, and whites to evoke a sense of calm and contentment. The design should be flat and stylized, with clean lines and a focus on the person's expression and posture."
Posted - Creating 15-25 Pieces of content per day - What the ****? - Twitteradmin
Twitter/X
Approach: Short, impactful tweet with hashtags to drive engagement.
Content: "Gary Vee says to put out 15-25 pieces of content per day. 🎯 Start with long-form content and repurpose across platforms like YouTube, Instagram, and TikTok. Who's ready to level up? #GaryVee #ContentStrategy #SocialMedia"
In the red corner, weighing in at… nah, I’m not going to do that schtick.
The plaintiff in the case is Alegeus Technologies, LLC, a Delaware Corporation that, according to their filings, “is a leading provider of a business-tobusiness, white-label funding and payment platform for healthcare carriers and third-party administrators to administer consumer-directed employee benefit programs”.
Not being subject to the US’ bonkers health care system, I have only a passing familiarity with the sorts of things they do, but presumably it involves moving a lot of money around, which is sometimes important.
The CA/Browser Forum Baseline Requirements (BRs) (which all CAs are required to adhere to, by virtue of their being included in various browser and OS trust stores), say that revocation is required within 24 hours when “[t]he CA obtains evidence that the validation of domain authorization or control for any Fully‐Qualified Domain Name or IP address in the Certificate should not be relied upon” (section 4.9.1.1, point 5).
DigiCert appears to have at least tried to do the right thing, by opening the above Mozilla bug giving some details of the problem, and notifying their customers that their certificates were going to be revoked.
One may quibble about how fast they’re doing it, but they’re giving it a decent shot, at least.
A complicating factor in all this is that, only a touch over a month ago, Google Chrome announced the removal of another CA, Entrust, from its own trust store program, citing “a pattern of compliance failures, unmet improvement commitments, and the absence of tangible, measurable progress in response to publicly disclosed incident reports”.
Many of these compliance failures were failures to revoke certificates in a timely manner.
One imagines that DigiCert would not like to gain a reputation for tardy revocation, particularly at the moment.
The Legal Action
Now we come to Alegeus Technologies.
They’ve opened a civil case whose first action is to request the issuance of a Temporary Restraining Order (TRO) that prevents DigiCert from revoking certificates issued to Alegeus (which the court has issued).
This is a big deal, because TROs are legal instruments that, if not obeyed, constitute contempt of court (or something similar) – and courts do not like people who disregard their instructions.
That means that, in the short term, those certificates aren’t getting revoked, despite the requirement imposed by root stores on DigiCert that the certificates must be revoked.
DigiCert is in a real “rock / hard place” situation here: revoke and get punished by the courts, or don’t revoke and potentially (though almost certainly not, in the circumstances) face removal from trust stores (which would kill, or at least massively hurt, their business).
The reasons that Alegeus gives for requesting the restraining order is that “[t]o
Reissue and Reinstall the Security Certificates, Alegeus must work with and
coordinate with its Clients, who are required to take steps to rectify the
certificates. Alegeus has hundreds of such Clients. Alegeus is generally
required by contract to give its clients much longer than 24 hours’ notice
before executing such a change regarding certification.”
In the filing, Alegeus does acknowledge that “DigiCert is a voluntary member of the Certification Authority Browser Forum (CABF), which has bylaws stating that certificates with an issue in their domain validation must be revoked within 24 hours.”
This is a misstatement of the facts, though.
It is the BRs, not the CABF bylaws, that require revocation, and the BRs apply to all CAs that wish to be included in browser and OS trust stores, not just those that are members of the CABF.
In any event, given that Alegeus was aware that DigiCert is required to revoke certificates within 24 hours, one wonders why Alegeus went ahead and signed agreements with their customers that required a lengthy notice period before changing certificates.
What complicates the situation is that there is apparently a Master Services Agreement (MSA) that states that it “constitutes the entire agreement between the parties” – and that MSA doesn’t mention certificate revocation anywhere relevant.
That means that it’s not quite so cut-and-dried that DigiCert does, in fact, have the right to revoke those certificates.
I’d expect a lot of “update to your Master Services Agreement” emails to be going out from DigiCert (and other CAs) in the near future to clarify this point.
Not being a lawyer, I can’t imagine which way this case might go, but there’s one thing we can be sure of: some lawyers are going to able to afford that trip to a tropical paradise this year.
The Security Issues
The requirement for revocation within 24 hours is an important security control in the WebPKI ecosystem.
If a certificate is misissued to a malicious party, or is otherwise compromised, it needs to be marked as untrustworthy as soon as possible.
While revocation is far from perfect, it is the best tool we have.
In this court filing, Alegeus has claimed that they are unable to switch certificates with less than 24 hours notice (due to “contractual SLAs”).
This is a pretty big problem, because there are lots of reasons why a certificate might need to be switched out Very Quickly.
As a practical example, someone with access to the private key for your SSL certificate might decide to use it in a blog post.
Letting that sort of problem linger for an extended period of time might end up being a Pretty Big Problem of its own.
An organisation that cannot respond within hours to a compromised certificate is playing chicken with their security.
The Takeaways
Contractual obligations that require you to notify anyone else of a certificate (or private key) changing are bonkers, and completely antithetical to the needs of the WebPKI.
If you have to have them, you’re going to want to start transitioning to a private PKI, wherein you can do whatever you darn well please with revocation (or not).
As these sorts of problems keep happening, trust stores (and hence CAs) are going to crack down on this sort of thing, so you may as well move sooner rather than later.
If you are an organisation that uses WebPKI certificates, you’ve got to be able to deal with any kind of certificate revocation event within hours, not days.
This basically boils down to automated issuance and lifecycle management, because having someone manually request and install certificates is terrible on many levels.
There isn’t currently a completed standard for notifying subscribers if their certificates need premature renewal (say, due to needing to be revoked), but the ACME Renewal Information Extension is currently being developed to fill that need.
Ask your CA if they’re tracking this standards development, and when they intend to have the extension available for use.
(Pro-tip: if they say “we’ll start doing development when the RFC is published”, run for the hills; that’s not how responsible organisations work on the Internet).
As regular readers would know, since I never stop banging on about it, I run Pwnedkeys, a service which finds and collates private keys which have been disclosed or are otherwise compromised.
Until now, the only way to check if a key is compromised has been to use the Pwnedkeys API, which is not necessarily trivial for everyone.
Starting today, that’s changing.
The next phase of Pwnedkeys is to start offering more user-friendly tools for checking whether keys being used are compromised.
These will typically be web-based or command-line tools intended to answer the question “is the key in this (certificate, CSR, authorized_keys file, TLS connection, email, etc) known to Pwnedkeys to have been compromised?”.
Check the key in a PEM-format X509 data structure (such as a CSR or certificate);
Check the keys in an authorized_keys file you upload; and
Check the SSH keys used by a user at any one of a number of widely-used code-hosting sites.
Further planned tools include “live” checking of the certificates presented in TLS connections (for HTTPS, etc), SSH host keys, command-line utilities for checking local authorized_keys files, and many other goodies.
If You Are Intrigued By My Ideas…
… and wish to subscribe to my newsletter, now you can!
I’m not going to be blogging every little update to Pwnedkeys, because that would probably get a bit tedious for readers who aren’t as intrigued by compromised keys as I am.
Instead, I’ll be posting every little update in the Pwnedkeys newsletter.
So, if you want to keep up-to-date with the latest and greatest news and information, subscribe to the newsletter.
Supporting Pwnedkeys
All this work I’m doing on my own time, and I’m paying for the infrastructure from my own pocket.
If you’ve got a few dollars to spare, I’d really appreciate it if you bought me a refreshing beverage.
It helps keep the lights on here at Pwnedkeys Global HQ.
Hello amazing teachers! Are you looking for a fun and engaging way to bring history to life for your students? Meet Sabaton, a Swedish heavy metal band known for their powerful songs about historical events. While heavy metal might not be the first thing that comes to mind for a primary school setting, Sabaton’s music […]
I’ve been daily driving the PinePhone Pro with swmo for some times now, it’s not perfect but I still find it be one of the most enjoyable devices I’ve used. Probably only behind BlackBerry Q30/Passport which also has a decent keyboard and runs an unfortunately locked-down version of QNX. For me it’s less like a phone and more like a portable terminal for times when using a full size laptop is uncomfortable or impractical, and with the keyboard it’s possible to write lengthy articles on the go.
This isn’t the only portable Linux terminal I owned, before this I used a Nokia N900 which till this day is still being maintained by the maemo leste team, but the shutdown of 3G network in where I live made it significantly less usable as a phone and since it doesn’t have a proper USB port I cannot use it as a serial console easily.
The overall experience on the PPP now as of 2024 isn’t as polished as that of the BlackBerry Passport, and adhoc hacks are often required to get the system going, however as the ecosystem progress the experience will also improve with new revisions of hardware and better software.
I use sxmo and swmo interchangeably in this post, they refer to the same framework running under Xorg and wayland, the experience is pretty much the same.
The default scaling of sxmo doesn’t allow the many desktop applications to display their window properly, especially when such application is written under the assumption of being used on a larger screen. To set the scaling to something more reasonable, add the following line to ~/.config/sxmo/sway:
exec wlr-randr --output DSI-1 --scale 1.3
When using swmo environment initialization is mostly done in ~/.config/sxmo/sway and ~/.config/sxmo/xinit is not used.
Scaling for Firefox needs to be adjusted separately by first enabling compact UI and then set settings -> default zoom to your liking.
I used lightdm as my session manager, to launch lightdm in landscape mode, change the display-setup-script line in the [Seat:*] section of /etc/lightdm/lightdm.conf to:
To rotate Linux framebuffer, add fbcon=rotate:1 to the U_BOOT_PARAMETERS line in /usr/share/u-boot-menu/conf.d/mobian.conf and run u-boot-update to apply.
I also removed quiet splash from U_BOOT_PARAMETERS to disable polymouth animation as it isn’t very useful on landscape mode.
Swmo doesn’t come with a secure screen locker. but swaylock works fine and it can be bind to a key combination with sway’s configure file. To save some battery life, systemctl suspend can be triggered after swaylock, to bind that to Meta+L:
The default keymap for the PinePhone keyboard is missing a few useful keys, namely F11/F12 and PgUp/PgDown. To create those keys I used evremap(1) to make a custom keymap. Unfortunately the Fn key cannot be mapped as a layer switcher easily, so I opted to remap AltG and Esc as my primary modifiers.
I’m working on a Debian package for evremap and it will be made available for Debian/Mobian soon.
Incus is a container/VM manager for Linux, it’s available for Debian from bookworm/backports and is a fork of LXD by the original maintainers behind LXD. It works well for creating isolated and unprivileged containers. I have multiple incus containers on the PinePhone Pro for Debian packaging and it’s a better experience than manually creating and managing chroots. In case there is a need for running another container inside an unprivileged incus container, it’s possible to configure incus to intercept certain safe system calls and forward them to the host, removing the need for using privileged container.
Sway is decently usable in convergence mode, in which the phone is connected to a dock that outputs to an external display and keyboard and mouse are used as primary controls instead of the touchscreen.
This isn’t surprising since sway always had great support for multi monitor, however another often overlooked convergence mode is with waypipe. In this mode another Linux machine (e.g. a laptop) can be used to interact with applications running on the phone and the phone will be kept charged by the laptop. This is particularly useful for debugging phone applications or for accessing resources on the phone (e.g. sending and receiving sms). One thing missing in this setup is that graphic applications cannot roam between the phone and the external system (e.g. move running applications from one machine to another). Xpra does this for Xorg but doesn’t work with wayland.
Due to the simplicity of the swmo environment it’s not too difficult to get the system running with SELinux in Enforcing mode, and I encourage everyone reading this to try it. If running debian/mobian a good starting point is the SELinux/Setup page on Debian wiki.
Note: selinux-activate won’t add the required security=selinux kernel option to u-boot (it only deals with GRUB) so you have to manually add it to the U_BOOT_PARAMETERS line in /usr/share/u-boot-menu/conf.d/mobian.conf and run u-boot-update after selinux-activate. The file labeling process can easily take 10 minutes and the progress won’t be displayed on the framebuffer (only visible via the serial console).
SELinux along with the reference policy aren’t enough for building a reasonably secure interactive system, but let’s leave that for a future post.
The April 2024 meeting is the first meeting after Everything Open 2024 and the discussions are primarily around talks and lectures people found interesting during the conference, including the n3n VPN and the challenges of running personal email server. At the start of the meeting Yifei Zhan demonstrated a development build of Maemo Leste, an active Maemo-like operating system running on a PinePhone Pro.
Other topics discussed including modern network protocol ossification, SIP and possible free and open source VoLTE implementation.
The PinePhone keyboard contains a battery, which will be used to charge the PinePhone when the keyboard is attached. Althrough there are existing warnings on the pine64 wiki which sums up to ‘don’t charge or connect anything to your pinephone’s type C interface when the keyboard is attached’, my two pinephone keyboards still managed to fry themselves, with one releasing stinky magic smoke and the other melting the plastic around the pogo pins on the pinephone backplate.
This all happened while the pinephone’s type C interface being physically block when attached to the keyboard. In the first case, the keyboard’s controller PCB blew up when I tried to charge it, in the latter case the keyboard somehow overheated and melted the plastic near the pogo interface on the phone side.
Pine64 provided me a free replacement keyboard after multiple emails back and forth, but according to Pine64 there will be no more free replacement for me in future, and there is no guarantee that this will not happen to my replacement keyboard.
The cost for replacing all the fried parts with spare parts from the Pine64 store is about 40 USD (pogo pins + backplate + keyboard PCB), and considering this problem is likely to happen again, I don’t think purchasing those parts is a wise decision.
Both the melting plastic and the magic smoke originated from the fact that charges are constantly shuffled around when the keyboard is attached to the pinephone, and since the keyboard can function independently from the battery, we can disconnect and remove the battery from the keyboard case to make sure it will not blow up again. After such precedure the keyboard will keep functioning althrough the keyboard-attached pinephone might flip over much more easily due to the lightened keyboard base. Be aware that the keyboard isn’t designed to be taken apart, and doing so will likely result in scratches on the case. As for me, I’d much rather have a keyboard case without builtin battery than have something that can overheat or blow up.
On Friday, 1st March, it will be exactly one year since I walked into
Zen Motorcycles, signed the
paperwork, and got on my brand new Energica Experia electric motorbike. I
then rode it back to Canberra, stopping at two places to charge along the
way, but that was more in the nature of making sure - it could have done the
trip on one better-chosen charging stop.
I got a call yesterday from a guy who had looked at the Experia Bruce has at
Zen and was considering buying one. I talked with him for about three
quarters of an hour, going through my experience, and to sum it up simply I
can just say: this is a fantastic motorbike.
Firstly, it handles exactly like a standard motorbike - it handles almost
exactly like my previous Triumph Tiger Sport 1050. But it is so much easier
to ride. You twist the throttle and you go. You wind it back and you slow
down. If you want to, the bike will happily do nought to 100km/hr in under
four seconds. But it will also happily and smoothly glide along in traffic.
It says "you name the speed, I'm happy to go". It's not temperamental or
impatient; it has no weird points where the throttle suddenly gets an extra
boost or where the engine braking suddenly drops off. It is simple to ride.
As an aside, this makes it perfect for lane filtering. On my previous bike
this would always be tinged with a frisson of danger - I had to rev it and
ease the clutch in with a fair bit of power so I didn't accidentally stall it,
but that always took some time. Now, I simply twist the throttle and I am
ahead of the traffic - no danger of stalling, no delay in the clutch gripping,
just power. It is much safer in that scenario.
I haven't done a lot of touring yet, but I've ridden up to Gosford once and
up to Sydney several times. This is where Energica really is ahead of pretty
much every other electric motorbike on the market now - they do DC fast
charging. And by 'fast charger' here I mean anything from 50KW up; the
Energica can only take 25KW maximum anyway :-) But this basically means I
have to structure any stops we do around where I can charge up - no more
stopping in at the local pub or a cafe on a whim for morning tea. That has
to either offer DC fast charging or I'm moving on - the 3KW onboard AC
charger means a 22KW AC charger is useless to me. In the hour or two we
might stop for lunch I'd only get another 60 - 80 kilometres more range on
AC; on DC I would be done in less than an hour.
But OTOH my experience so far is that structuring those breaks around where I
can charge up is relatively easy. Most riders will furiously nod when I say
that I can't sit in the seat for more than two hours before I really need to
stretch the legs and massage the bum :-) So if that break is at a DC charger,
no problems. I can stop at Sutton Forest or Pheasant's Nest or even
Campbelltown and, in the time it takes for me to go to the toilet and have a
bit of a coffee and snack break, the bike is basically charged and ready to
go again.
The lesson I've learned, though, is to always give it that bit longer and
charge as much as I can up to 80%. It's tempting sometimes when I'm standing
around in a car park watching the bike charge to move on and charge up a bit
more at the next stop. The problem is that, with chargers still relatively
rare and there often only being one or two at each site, a single charger
not working can mean another fifty or even a hundred kilometres more riding.
That's a quarter to half my range, so I cannot afford to risk that. Charge
up and take a good book (and a spare set of headphones).
In the future, of course, when there's a bank of a dozen DC fast chargers in
every town, this won't be a problem. Charger anxiety only exists because
they are still relatively rare. When charging is easy to find and always
available, and there are electric forecourts like the UK is starting to get,
charging stops will be easy and will fit in with my riding.
Anyway.
Other advantages of the Experia:
You can get it with a complete set of Givi MonoKey top box and panniers. This
means you can buy your own much nicer and more streamlined top box and it fits
right on.
Charging at home takes about six hours, so it's easy to do overnight. The
Experia comes with an EVSE so you don't need any special charger at home. And
really, since the onboard AC charger can only accept 3KW, there's hardly any
point in spending much money on a home charger for the Experia.
Minor niggles:
The seat is a bit hard. I'm considering getting the
EONE
Canyon saddle, although I also just need to try to work out how to get
underneath the seat to see if I can fit my existing sheepskin seat cover.
There are a few occasional glitches in the display in certain rare situations.
I've mentioned them to Energica, hopefully they'll be addressed.
Way back in the distant past, when the Apple ][ and the Commodore 64 were king, you could read the manual for a microprocessor and see how many CPU cycles each instruction took, and then do the math as to how long a sequence of instructions would take to execute. This cycle counting was used pretty effectively to do really neat things such as how you’d get anything on the screen from an Atari 2600. Modern CPUs are… complex. They can do several things at once, in a different order than what you wrote them in, and have an interesting arrangement of shared resources to allocate.
So, unlike with simpler hardware, if you have a sequence of instructions for a modern processor, it’s going to be pretty hard to work out how many cycles that could take by hand, and it’s going to differ for each micro-architecture available for the instruction set.
When designing a microprocessor, simulating what a series of existing instructions will take to execute compared to the previous generation of microprocessor is pretty important. The aim should be for it to take less time or energy or some other metric that means your new processor is better than the old one. It can be okay if processor generation to generation some sequence of instructions take more cycles, if your cycles are more frequent, or power efficient, or other positive metric you’re designing for.
Programmers may want this simulation too, as some code paths get rather performance critical for certain applications. Open Source tools for this aren’t as prolific as I’d like, but there is llvm-mca which I (relatively) recently learned about.
llvm-mca is a performance analysis tool that uses information available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific CPU.
So, when looking at an issue in the IPv6 address and connection hashing code in Linux last year, and being quite conscious of modern systems dealing with a LOT of network packets, and thus this can be quite CPU usage sensitive, I wanted to make sure that my suggested changes weren’t going to have a large impact on performance – across the variety of CPU generations in use.
There’s two ways to do this: run everything, throw a lot of packets at something, and measure it. That can be a long dev cycle, and sometimes just annoying to get going. It can be a lot quicker to simulate the small section of code in question and do some analysis of it before going through the trouble of spinning up multiple test environments to prove it in the real world.
So, enter llvm-mca and the ability to try and quickly evaluate possible changes before testing them. Seeing as the code in question was nicely self contained, I could easily get this to a point where I could easily get gcc (or llvm) to spit out assembler for it separately from the kernel tree. My preference was for gcc as that’s what most distros end up compiling Linux with, including the Linux distribution that’s my day job (Amazon Linux).
In order to share the results of the experiments as part of the discussion on where the code changes should end up, I published the code and results in a github project as things got way too large to throw on a mailing list post and retain sanity.
I used a container so that I could easily run it in a repeatable isolated environment, as well as have others reproduce my results if needed. Different compiler versions and optimization levels will very much produce different sequences of instructions, and thus possibly quite different results. This delta in compiler optimization levels is partially why the numbers don’t quite match on some of the mailing list messages, although the delta of the various options was all the same. The other reason is learning how to better use llvm-mca to isolate down the exact sequence of instructions I was caring about (and not including things like the guesswork that llvm-mca has to do for branches).
One thing I learned along the way is how to better use llvm-mca to get the results that I was looking for. One trick is to very much avoid branches, as that’s going to be near complete guesswork as there’s not a simulation of the branch predictor (at least in the version I was using.
The big thing I wanted to prove: is doing the extra work having a small or large impact on number of elapsed cycles. The answer was that doing a bunch of extra “work” was essentially near free. The CPU core could execute enough things in parallel that the incremental cost of doing extra work just… wasn’t relevant.
This helped getting a patch deployed without impact to performance, as well as get a patch upstream, fixing an issue that was partially fixed 10 years prior, and had existed since day 1 of the Linux IPv6 code.
Naturally, this wasn’t a solo effort, and that’s one of the joys of working with a bunch of smart people – both at the same company I work for, and in the broader open source community. It’s always humbling when you’re looking at code outside your usual area of expertise that was written (and then modified) by Really Smart People, and you’re then trying to fix a problem in it, while trying to learn all the implications of changing that bit of code.
Anyway, check out llvm-mca for your next adventure into premature optimization, as if you’re going to get started with evil, you may as well start with what’s at the root of all of it.
At this rate, there is no real blogging here, regardless of the lofty plans to starting writing more. Stats update from Hello 2023:
219 days on the road (less than 2022! -37, over a month, shocking), 376,961km travelled, 44 cities, 17 countries.
Can’t say why it was less, because it felt like I spent a long time away…
In Kuala Lumpur, I purchased a flat (just in time to see Malaysia go down), and I swapped cars (had a good 15 year run). I co-founded a company, and I think there is a lot more to come.
2024 is shaping up to be exciting, busy, and a year, where one must just do.
It’s time for a review of the second year of operation of our Redflow ZCell battery and Victron Energy inverter/charger system. To understand what follows it will help to read the earlier posts in this series:
Go With The Flow (what all the pieces are, what they do, some teething problems)
TANSTAAFL (review/analysis of the first year of operation)
In case ~12,000 words of background reading seem daunting, I’ll try to summarise the most important details here:
We have a 5.94kW solar array hooked up to a Victron MPPT RS solar charge controller, two Victron 5kW Multi-Plus II inverter/chargers, a Victron Cerbo GX console, and a single 10kWh Redflow ZCell battery. It works really well. We’re using most of our generated power locally, and it’s enabled us to blissfully coast through several grid power outages and various other minor glitches. The Victron gear and the ZCell were installed by Lifestyle Electrical Services.
Redflow batteries are excellent because you can 100% cycle them every day, and they aren’t a giant lump of lithium strapped to your house that’s impossible to put out if it bursts into flames. The catch is that they need to undergo periodic maintenance where they are completely discharged for a few hours at least every three days. If you have more than one, that’s fine because the maintenance cycles interleave (it’s all automatic). If you only have one, you can’t survive grid outages if you’re in a maintenance period, and you can’t ordinarily use the Cerbo’s Minimum State of Charge (MinSoC) setting to perpetually keep a small charge in the battery in case of emergencies. As we still only have one battery, I’ve spent a fair bit of time experimenting to mitigate this as much as I can.
The system itself requires a certain amount of power to run. Think of the pumps and fans in the battery, and the power used directly by the inverters and the console. On top of that a certain amount of power is simply lost to AC/DC conversion and charge/discharge inefficiencies. That’s power that comes into your house from the grid and from the sun that your loads, i.e. the things you care about running, don’t get to use. This is true of all solar PV and battery storage systems to a greater or lesser degree, but it’s not something that people always think about.
With the background out of the way we can get on to the fun stuff, including a roof replacement, an unexpected fault after a power outage followed by some mains switchboard rewiring, a small electrolyte leak, further hackery to keep a bit of charge in the battery most of the time, and finally some numbers.
The big job we did this year was replacing our concrete tile roof with colorbond steel. When we bought the house – which is in a rural area and thus a bushfire risk – we thought: “concrete brick exterior, concrete tile roof – sweet, that’s not flammable”. Unfortunately it turns out that while a tile roof works just fine to keep water out, it won’t keep embers out. There’s a gadzillion little gaps where the tiles overlap each other, and in an ember attack, embers will get up in there and ignite the fantastic amount of dust and other stuff that’s accumulated inside the ceiling over several decades, and then your house will burn down. This could be avoided by installing roof blanket insulation under the tiles, but in order to do that you have to first remove all the tiles and put them down somewhere without breaking them, then later put them all back on again. It’s a lot of work. Alternately, you can just rip them all off and replace the whole lot with nice new steel, with roof blanket insulation underneath.
Of course, you need good weather to replace a roof, and you need to take your solar panels down while it’s happening. This meant we had twenty-two solar panels stacked on our back porch for three weeks of prime PV time from February 17 – March 9, 2023, which I suspect lost us a good 500kW of power generation. Also, the roof job meant we didn’t have the budget to get a second ZCell this year – for the cost of the roof replacement, we could have had three new ZCells installed – but as my wife rightly pointed out, all the battery storage in the world won’t do you any good if your house burns down.
We had at least five grid power outages during the year. A few were brief, the grid being down for only a couple of minutes, but there were two longer ones in September (one for 30 minutes, one for about an hour and half). We got through the long ones just fine with either the sun high in the sky, or charge in the battery, or both. One of the earlier short outages though uncovered a problem. On the morning of May 30, my wife woke up to discover there was no power, and thus no running water. Not a good thing to wake up to. This happened while I was away, because of course something like this would happen while I was away. It turns out there had been a grid outage at about 02:10, then the grid power had come back, but our system had not. The Multis ended up in some sort of fault state and were refusing to power our loads. On the console was an alarm message: “#8 – Ground relay test failed”.
That doesn’t look good.
Note the times in the console messages are about 08:00. I confirmed via the logs from the VRM portal that the grid really did go out some time between 02:10 and 02:15, but after that there was nothing in the logs until 07:59, which is when my wife used the manual changeover switch to shift all our loads back to direct grid power, bypassing the Victron kit. That brought our internet connection back, along with the running water. I contacted Murray Roberts from Lifestyle Electrical and Simon Hackett for assistance, Murray logged in remotely and reset the Multis, my wife flicked the changeover switch back and everything was fine. But the question remained, what had gone wrong?
The ground relay in the Multis is there to connect neutral to ground when the grid fails. Neutral and ground are already physically connected on the grid (AC input) side of the Multis in the main switchboard, but when the grid power goes out, the Multis disconnect their inputs, which means the loads on the AC output side no longer have that fixed connection from neutral to ground. The ground relay activates in this case to provide that connection, which is necessary for correct operation of the safety switches on the power circuits in the house.
The ground relay is tested automatically by the Multis. Looking up Error 8 – Ground relay test failed on Victron’s web site indicated that either the ground relay really was faulty, or possibly there was a wiring fault or an issue with one of the loads in our house. So I did some testing. First, with the battery at 50% State of Charge (SoC), I did the following:
Disconnected all loads (i.e. flipped the breaker on the output side of the Multis)
Killed the mains (i.e. flipped the breaker on the input side of the Multis)
Verified the system switched to inverting mode (i.e. running off the battery)
Restored mains power
Verified there was no error
This demonstrated that the ground relay and the Multis in general were fine. Had there been a problem at that level we would have seen an error when I restored mains power. I then reconnected the loads and repeated steps 2-5 above. Again, there was no error which indicated the problem wasn’t due to a wiring defect or short in any of the power or lighting circuits. I also re-tested with the heater on and the water pump running just in case there may have been an issue specifically with either of those devices. Again, there was no error.
The only difference between my test above and the power outage in the middle of the night was that in the middle of the night there was no charge in the battery (it was right after a maintenance cycle) and no power from the sun. So in the evening I turned off the DC isolators for the PV and deactivated my overnight scheduled grid charge so there’d be no backup power of any form in the morning. Then I repeated the test:
Disconnected all loads
Killed the mains.
Checked the console which showed the system as “off”, as opposed to “inverting”, as there was no battery power or solar generation
Restored mains power
Shortly thereafter, I got the ground relay test failed error
The underlying detailed error message was “PE2 Closed”, which meant that it was seeing the relay as closed when it’s meant to be open. Our best guess is that we’d somehow hit an edge case in the Multi’s ground relay test, where they maybe tried to switch to inverting mode and activated the ground relay, then just died in that state because there was no backup power, and got confused when mains power returned. I got things running again by simply power cycling the Multis.
So it kinda wasn’t a big deal, except that if the grid went out briefly with no backup power, our loads would remain without power until one of us manually reset the system. This was arguably worse than not having the system at all, especially if it happened in the middle of the night, or when we were away from home. The fact that we didn’t hit this problem in the first year of operation is a testament to how unlikely this event is, but the fact that it could happen at all remained a problem.
One fix would have been to get a second battery, because then we’d be able to keep at least a tiny bit of backup power at all times regardless of maintenance cycles, but we’re not there yet. Happily, Simon found another fix, which was to physically connect the neutral together between the AC input and AC output sides of the Multis, then reconfigure them to use the grid code “AS4777.2:2015 AC Neutral Path externally joined”. That physical link means the load (output) side picks up the ground connection from the grid (input) side in the swichboard, and changing the grid code setting in the Multis disables the ground relay and thus the test which isn’t necessary anymore.
Murray needed to come out anyway to replace the carbon sock in the ZCell (a small item of annual maintenance) and was able to do that little bit of rewriting and configuration at the same time. I repeated my tests both with and without backup power and everything worked perfectly, i.e. the system came back immediately by itself after a grid outage with no backup power, and of course switched over to inverting just fine when there was backup power available.
This leads to the next little bit of fun. The carbon sock is a thing that sits inside the zinc electrolyte tank and helps to keep the electrolyte pH in the correct operating range. Unfortunately I didn’t manage to get a photo of one, but they look a bit like door snakes. Replacing the carbon sock means opening the case, popping one side of the Gas Handling Unit (GHU) off the tank, pulling out the old sock and putting in a new one. Here’s a picture of the ZCell with the back of the case off, indicating where the carbon sock goes:
The tank on the left (with the cooling fan) is for zinc electrolyte. The tank on the right is for bromine electrolyte. The blocky assembly of pipes going into both tanks is the GHU. The rectangular box behind that contains the electrode stacks.
When Murray popped the GHU off, he noticed that one of the larger pipes on one side had perished slightly. Thankfully he happened to have a spare GHU with him so was able to replace the assembly immediately. All was well until later that afternoon, when the battery indicated hardware failure due to “Leak 1 Trip” and shut itself down out of an abundance of caution. Upon further investigation the next day, Murry and I discovered there was a tiny split in one of the little hoses going into the GHU which was letting the electrolyte drip out.
Drip… Drip… Drip…
This small electrolyte leak was caught lower down in the battery, where the leak sensor is. Murray sucked the leaked electrolyte out of there, re-terminated that little hose and we were back in business. I was happy to learn that Redflow had obviously thought about the possibility of this type of failure and handled it. As I said to Murray at the time, we’d rather have a battery that leaks then turns itself off than a battery that catches fire!
Aside from those two interesting events, the rest of the year of operation was largely quite boring, which is exactly what one wants from a power system. As before I kept a small overnight scheduled charge and a larger late afternoon scheduled charge active on weekdays to ensure there was some power in the battery to use at peak (i.e. expensive) grid times. In spring and summer the afternoon charge is largely superfluous because the battery has usually been well filled up from the solar by then anyway, but there’s no harm in leaving it turned on. The one hack I did do during the year was to figure out a way to keep a small (I went with 15%) MinSoC in the battery at all times except for maintenance cycle evenings, and the morning after. This is more than enough to smooth out minor grid outages of a few minutes, and given our general load levels should be enough to run the house for more than an hour overnight if necessary, provided the hot water system and heating don’t decide to come on at the same time.
My earlier experiment along these lines involved a script that ran on the Cerbo twice a day to adjust scheduled charge settings in order to keep the battery at 100% SoC at all times except for peak electricity hours and maintenance cycle evenings. As mentioned in TANSTAAFL I ran that for all of July, August and most of September 2022. It worked fine, but ultimately I decided it was largely a waste of energy and money, especially when run during the winter months when there’s not much sun and you end up doing a lot of grid charging. This is a horribly inefficient way of getting power into the battery (AC to DC) versus charging the battery direct from solar PV. We did still use those scripts in the second year, but rather more judiciously, i.e. we kept an eye on the BOM forecasts as we always do, then occasionally activated the 100% charge when we knew severe weather and/or thunderstorms were on the way, those being the things most likely to cause extended grid outages. I also manually triggered maintenance on the battery earlier than strictly necessary several times when we expected severe weather in the coming days, to avoid having a maintenance cycle (and thus empty battery) coincide with potential outages. On most of those occasions this effort proved to be unnecessary. Bearing all that in mind, my general advice to anyone else with a single ZCell system (aside from maybe adding scheduled charges to time-shift expensive peak electricity) is to just leave it alone and let it do its thing. You’ll use most of your locally generated electricity onsite, you’ll save some money on your power bills, and you’ll avoid some, but not all, grid outages. This is a pretty good position to be in.
That said, I couldn’t resist messing around some more, hence my MinSoC experiment. Simon’s installation guide points out that “for correct system operation, the Settings->ESS menu ‘Min SoC’ value must be set to 0% in single-ZCell systems”. The issue here is that if MinSoC is greater than 0%, the Victron gear will try to charge the battery while the battery is simultaneously trying to empty itself during maintenance, which of course just isn’t going to work. My solution to this is the following script, which I run from a cron job on the Cerbo twice a day, once at midnight UTC and again at 06:00 UTC with the --check-maintenance flag set:
Midnight UTC corresponds to the end of our morning peak electricity time, and 06:00 UTC corresponds to the start of our afternoon peak. What this means is that after the morning peak finishes, the MinSoC setting will cause the system to automatically charge the battery to the value specified if it’s not up there already. Given it’s after the morning peak (10:00 AEST / 11:00 AEDT) this charge will likely come from solar PV, not the grid. When the script runs again just before the afternoon peak (16:00 AEST / 17:00 AEDT), MinSoC is set to either the value specified (effectively a no-op), or zero if it’s a maintenance day. This allows the battery to be discharged correctly in the evening on maintenance days, while keeping some charge every other day in case of emergencies. Unlike the script that tries for 100% SoC, this arrangement results in far less grid charging, while still giving protection from minor outages most of the time.
In case Simon is reading this now and is thinking “FFS, I wrote ‘MinSoC must be set to 0% in single-ZCell systems’ for a reason!” I should also add a note of caution. The script above detects ZCell maintenance cycles based solely on the configured maintenance time limit and the duration since last maintenance. It does not – and cannot – take into account occasions when the user manually forces maintenance, or situations in which a ZCell for whatever reason hypothetically decides to go into maintenance of its own accord. The latter shouldn’t generally happen, but it can. The point is, if you’re running this MinSoC script from a cron job, you really do still want to keep an eye on what the battery is doing each day, in case you need to turn that setting off and disable the cron job. If you’re not up for that I will reiterate my general advice from earlier: just leave the system alone – let it do its thing and you’ll (almost always) be perfectly fine. Or, get a second ZCell and you can ignore the last several paragraphs entirely.
Now, finally, let’s look at some numbers. The year periods here are a little sloppy for irritating historical reasons. 2018-2019, 2019-2020 and 2020-2021 are all August-based due to Aurora Energy’s previous quarterly billing cycle. The 2021-2022 year starts in late September partly because I had to wait until our new electricity meter was installed in September 2021, and partly because it let me include some nice screenshots when I started writing TANSTAAFL on September 25, 2022. I’ve chosen to make this year (2022-2023) mostly sane, in that it runs from October 1, 2022 through September 30, 2023 inclusive. This is only six days offset from the previous year, but notably makes it much easier to accurately correlate data from the VRM portal with our bills from Aurora. Overall we have five consecutive non-overlapping 12 month periods that are pretty close together. It’s not perfect, but I think it’s good enough to work with for our purposes here.
YeaR
Grid In
Solar In
Total In
Loads
Export
2018-2019
9,031
6,682
15,713
11,827
3,886
2019-2020
9,324
6,468
15,792
12,255
3,537
2020-2021
7,582
6,347
13,929
10,358
3,571
2021-2022
8,531
5,640
14,171
10,849
754
2022-2023
8,936
5,744
14,680
11,534
799
Overall, 2022-2023 had a similar shape to 2021-2022, including the fact that in both these years we missed three weeks of solar generation in late summer. In 2022 this was due to replacing the MPPT, and in 2023 it was because we replaced the roof. In both cases our PV generation was lower than it should have been by an estimated 500-600kW. Hopefully nothing like this happens again in future years.
All of our numbers in 2022-2023 were a bit higher than in 2021-2022. We pulled 4.75% more power from the grid, generated 1.84% more solar, the total power going into the system (grid + solar) was 3.59% higher, our loads used 6.31% more power, and we exported 5.97% more power than the previous year.
I honestly don’t know why our loads used more power this year. Here’s a table showing our consumption for both years, and the differences each month (note that September 2022 is only approximate because of how the years don’t quite line up):
Month
2022
2023
Diff
October
988
873
-115
November
866
805
-61
December
767
965
198
January
822
775
-47
February
638
721
83
March
813
911
98
April
775
1,115
340
May
953
1,098
145
June
1,073
1,149
76
July
1,118
1,103
-15
August
966
1,065
99
September
1,070
964
-116
Here’s a graph:
WTF happened in December and April?!?
Did we use more cooling this December? Did we use more heating this April and May? I dug the nearest weather station’s monthly mean minimum and maximum temperatures out of the BOM Climate Data Online tool and found that there’s maybe a degree or so variance one way or the other each month year to year, so I don’t know what I can infer from that. All I can say is that something happened in December and April, but I don’t know what.
Another interesting thing is that what I referred to as “the energy cost of the system” in TANSTAAFL has gone down. That’s the kW figure below in the “what?” column, which is the difference between grid in + solar in – loads – export, i.e. the power consumed by the system itself. In 2021-2022, that was 2,568 kW, or about 18% of the total power that went into the system. In 2022-2023 it was down to 2,347kWh, or just under 16%:
Year
Grid In
Solar In
Total In
Loads
Export
Total Out
what?
2021-2022
8,531
5,640
14,171
10,849
754
11,603
2,568
2022-2023
8,936
5,744
14,680
11,534
799
12,333
2,347
I suspect the cause of this reduction is that we didn’t spend two and a half months doing lots of grid charging of the battery in 2022-2023. If that’s the case, this again points to the advisability of just letting the system do its thing and not messing with it too much unless you really know you need to.
The last set of numbers I have involve actual money. Here’s what our electricity bills looked like over the past five years:
Year
From Grid
Total Bill
Cost/kWh
2018-2019
9,031
$2,278.33
$0.25
2019-2020
9,324
$2,384.79
$0.26
2020-2021
7,582
$1,921.77
$0.25
2021-2022
8,531
$1,731.40
$0.20
2022-2023
8,936
$1,989.12
$0.22
Note that cost/kWh as I have it here is simply the total dollar amount of our bills divided by the total power drawn from the grid (I’m deliberately ignoring the additional power we use that comes from the sun in this calculation). The bills themselves say “peak power costs $X, off-peak costs $Y, you get $Z back for power exported and there’s a daily supply charge of $SUCKS_TO_BE_YOU”, but that’s all noise. What ultimately matters in my opinion is what I call the effective cost per kilowatt hour, which is why those things are all smooshed together here. The important point is that with our existing solar array we were previously effectively paying about $0.25 per kWh for grid power. After getting the battery and switching to Peak & Off-Peak billing, that went down to $0.20/kWh – a reduction of 20%. Now we’ve inched back up to $0.22/kWh, but it turns out that’s just because power prices have increased. As far as I can tell Aurora Energy don’t publish historical pricing data, so as a public service, I’ll include what I’ve been able to glean from our prior bills here:
July 2023 onwards:
Daily supply charge: $1.26389
Peak: $0.36198/kWh
Off-Peak: $0.16855/kWh
Feed-In Tariff: $0.10869/kWh
July 2022 – July 2023
Daily supply charge: $1.09903
Peak: $0.33399/kWh
Off-Peak: $0.15551/kWh
Feed-In Tariff: $0.08883/kWh
Before July 2022:
Daily supply charge: $0.98
Peak: $0.29852
Off-Peak: $0.139
Feed-In Tariff: $0.06501
It’s nice that the feed-in tariff (i.e. what you get credited when you export power) has gone up quite a bit, but unless you’re somehow able to export 2-3x more power than you import, you’ll never get ahead of the ~20% increase in power prices over the last two years.
Having calculated the effective cost/kWh for grid power, I’m now going to do one more thing which I didn’t think to do during last year’s analysis, and that’s calculate the effective cost/kWh of running our loads, bearing in mind that they’re partially powered from the grid, and partially from the sun. I’ve managed to dig up some old Aurora bills from 2016-2017, back before we put the solar panels on. This should make for an interesting comparison.
Year
From Grid
Total Bill
Grid $/kWh
Loads
Loads $/kWh
2016-2017
17,026
$4,485.45
$0.26
17,026
$0.26
2018-2019
9,031
$2,278.33
$0.25
11,827
$0.19
2019-2020
9,324
$2,384.79
$0.26
12,255
$0.19
2020-2021
7,582
$1,921.77
$0.25
10,358
$0.19
2021-2022
8,531
$1,731.40
$0.20
10,849
$0.16
2022-2023
8,936
$1,989.12
$0.22
11,534
$0.17
The first thing to note is the horrifying 17 megawatts we pulled in 2016-2017. Given the hot water and lounge room heat pump were on a separate tariff, I was able to determine that four of those megawatts (i.e. about 24% of our power usage) went on heating that year. Replacing the crusty old conventional electric hot water system with a Sanden heat pump hot water service cut that in half – subsequent years showed the heating/hot water tariff using about 2MW/year. We obviously also somehow reduced our loads by another ~3MW/year on top of that, but I can’t find the Aurora bills for 2017-2018 so I’m not sure exactly when that drop happened. My best guess is that I probably got rid of some old, always-on computer equipment.
The second thing to note is how the cost of running the loads drops. In 2016-2017 the grid cost/kWh is the same as the loads cost/kWh, because grid power is all we had. From 2018-2021 though, the load cost/kWh drops to $0.19, a saving of about 26%. It remains there until 2021-2022 when we got the battery and it dropped again to $0.16 (another 15% or so). So the big win was certainly putting the solar panels on and swapping the hot water system, with the battery being a decent improvement on top of that.
Further wins are going to come from decreasing our power consumption. In previous posts I had mentioned the need to replace panel heaters with heat pumps, and also that some of our aging computer equipment needed upgrading. We did finally get a heat pump installed in the master bedroom this year, and we replaced the old undersized lounge room heat pump with a new correctly sized unit. This happened on June 30 though, so will have had minimal impact on this years’ figures. Likewise an always-on computer that previously pulled ~100W is now better, stronger and faster in all respects, while only pulling ~50W. That will save us ~438kW of power per year, but given the upgrade happened in mid August, again we won’t see the full effects until later.
I’m looking forward to doing another one of these posts in a year’s time. Hopefully I will have nothing at all interesting to report.
I (relatively) recently went down the rabbit hole of trying out personal finance apps to help get a better grip on, well, the things you’d expect (personal finances and planning around them).
In the past, I’ve had an off-again-on-again relationship with GNUCash. I did give it a solid go for a few months in 2004/2005 it seems (I found my old files) and I even had the OFX exports of transactions for a limited amount of time for a limited number of bank accounts! Amazingly, there’s a GNUCash port to macOS, and it’ll happily open up this file from what is alarmingly close to 20 years ago.
Back in those times, running Linux on the desktop was even more of an adventure than it has been since then, and I always found GNUCash to be strange (possibly a theme with me and personal finance software), but generally fine. It doesn’t seem to have changed a great deal in the years since. You still have to manually import data from your bank unless you happen to be lucky enough to live in the very limited number of places where there’s some kind of automation for it.
So, going back to GNUCash was an option. But I wanted to survey the land of what was available, and if it was possible to exchange money for convenience. I am not big on the motivation to go and spend a lot of time on this kind of thing anyway, so it had to be easy for me to do so.
For my requirements, I basically had:
Support multiple currencies
Be able to import data from my banks, even if manually
Some kind of reporting and planning tools
Be easy enough to use for me, and not leave me struggling with unknown concepts
The ability to export data. No vendor lock-in
I viewed a mobile app (iOS) as a Nice to Have rather than essential. Given that, my shortlist was:
I’ve used it before, its web site at https://www.gnucash.org/ looks much the same as it always has. It’s Free and Open Source Software, and is thus well aligned with my values, and that’s a big step towards not having vendor lock-in.
I honestly could probably make it work. I wish it had the ability to import transactions from banks for anywhere I have ever lived or banked with. I also wish the UI got to be a bit more consistent and modern, and even remotely Mac like on the Mac version.
Honestly, if the deal was that a web service would pull bank transactions in exchange for ~$10/month and also fund GNUCash development… I’d struggle to say no.
Here’s an option that has been around forever – https://www.quicken.com/ – and one that I figured I should solidly look at. It’s actually one I even spent money on…. before requesting a refund. It’s Import/Export is so broken it’s an insult to broken software everywhere.
Did you know that Quicken doesn’t import the Quicken Interchange Format (QIF), and hasn’t since 2005?
Me, incredulously, when trying out quicken
I don’t understand why you wouldn’t support as many as possible formats that banks export your transaction data as. It cannot possibly be that hard to parse these things, nor can it possibly be code that requires a lot of maintenance.
This basically meant that I couldn’t import data from my Australian Banks. Urgh. This alone ruled it out.
It really didn’t build confidence in ever getting my data out. At every turn it seemed to be really keen on locking you into Quicken rather than having a good experience all-up.
This one was new to me – https://www.wiz.money/ – and had a fancy URL and everything. I spent a bunch of time trying MoneyWiz, and I concluded that it is pretty, but buggy. I had managed to create a report where it said I’d earned $0, but you click into it, and then it gives actual numbers. Not being self consistent and getting the numbers wrong, when this is literally the only function of said app (to get the numbers right), took this out of the running.
It did sync from my US and Australian banks though, so points there.
Intuit used to own Quicken until it sold it to H.I.G. Capital in 2016 (according to Wikipedia). I have no idea if that has had an impact as to the feature set / usability of Quicken, but they now have this Cloud-only product called Mint.
The big issue I had with Mint was that there didn’t seem to be any way to get your data out of it. It seemed to exemplify vendor lock-in. This seems to have changed a bit since I was originally looking, which is good (maybe I just couldn’t find it?). But with the cloud-only approach I wasn’t hugely comfortable with having everything there. It also seemed to be lacking a few features that I was begging to find useful in other places.
It is the only product that links with the Apple Card though. No idea why that is the case.
The price tag of $0 was pretty unbeatable, which does make me wonder where the money is made from to fund its development and maintenance. My guess is that it’s through commission on the various financial products advertised through it, and I dearly hope it is not through selling data on its users (I have no reason to believe it is, there’s just the popular habit of companies doing this).
This is what I’ve settled on. It seemed to be easy enough for me to figure out how to use, sync with an iPhone App, be a reasonable price, and be able to import and sync things from accounts that I have. Oddly enough, nothing can connect and pull things from the Apple Card – which is really weird. That isn’t a Banktivity thing though, that’s just universal (except for Intuit’s Mint).
I’ve been using it for a bit more than a year now, and am still pretty happy. I wish there was the ability to attach a PDF of a statement to the Statement that you reconcile. I wish I could better tune the auto match/classification rules, and a few other relatively minor things.
Periodically in life I’ve had the desire to be somewhat fit, or at least have the benefits that come with that such as not dying early and being able to navigate a mountain (or just the city of Seattle) on foot without collapsing. I have also found that holding myself accountable via data is pretty vital to me actually going and repeatedly doing something.
So, at some point I got myself a Garmin watch. The year was 2012 and it was a Garmin Forerunner 410. It had a standard black/grey LCD screen, GPS (where getting a GPS lock could be utterly infuriatingly slow), a sensor you attached to your foot, a sensor you strap to your chest for Heart Rate monitoring, and an ANT+ dongle for connecting to a PC to download your activities. There was even some open source software that someone wrote so I could actually get data off my watch on my Linux laptops. This wasn’t a smart watch – it was exclusively for wearing while exercising and tracking an activity, otherwise it was just a watch.
However, as I was ramping up to marathon distance running, one huge flaw emerged: I was not fast enough to run a marathon in the time that the battery in my Garmin lasted. IIRC it would end up dying around 3hr30min into something, which at the time was increasingly something I’d describe as “not going for too long of a run”. So, the search for a replacement began!
The year was 2017, and the Garmin fenix 5x attracted me for two big reasons: a battery life to be respected, and turn-by-turn navigation. At the time, I seldom went running with a phone, preferring a tiny SanDisk media play (RIP, they made a new version that completely sucked) and a watch. The attraction of being able to get better maps back to where I started (e.g. a hotel in some strange city where I didn’t speak the language) was very appealing. It also had (what I would now describe as) rudimentary smart-watch features. It didn’t have even remotely everything the Pebble had, but it was enough.
So, a (non-trivial) pile of money later (even with discounts), I had myself a shiny and virtually indestructible new Garmin. I didn’t even need a dongle to sync it anywhere – it could just upload via its own WiFi connection, or through Bluetooth to the Garmin Connect app to my phone. I could also (if I ever remembered to), plug in the USB cable to it and download the activities to my computer.
One problem: my skin rebelled against the Garmin fenix 5x after a while. Like, properly rebelled. If it wasn’t coming off, I wanted to rip it off. I tried all of the tricks that are posted anywhere online. Didn’t help. I even got tested for what was the most likely culprit (a Nickel allergy), and didn’t have one of them, so I (still) have no idea what I’m actually allergic to in it. It’s just that I cannot wear it constantly. Urgh. I was enjoying the daily smart watch uses too!
So, that’s one rather expensive watch that is special purpose only, and even then started to get to be a bit of an issue around longer activities. Urgh.
So the hunt began for a smart watch that I could wear constantly. This usually ends in frustration as anything I wanted was hundreds of $ and pretty much nobody listed what materials were in it apart from “stainless steel”, “may contain”, and some disclaimer about “other materials”, which wasn’t a particularly useful starting point for “it is one of these things that my skin doesn’t like”. As at least if the next one also turned out to cause me problems, I could at least have a list of things that I could then narrow down to what I needed to avoid.
So that was all annoying, with the end result being that I went a long time without really wearing a watch. Why? The search resumed periodically and ended up either with nothing, or totally nothing. That was except if I wanted to get further into some vendor lock-in.
Honestly, the only manufacturer of anything smartwatch like which actually listed everything and had some options was Apple. Bizarre. Well, since I already got on the iPhone bandwagon, this was possible. Rather annoyingly, they are very tied together and thus it makes it a bit of a vendor-lock-in if you alternate phone and watch replacement and at any point wish to switch platforms.
That being said though, it does work well and not irritate my skin. So that’s a bonus! If I get back into marathon level distance running, we’ll see how well it goes. But for more common distances that I’ve run or cycled with it… the accuracy seems decent, HR monitor never just sometimes decides I’m not exerting myself, and the GPS actually gets a lock in reasonable time. Plus it can pair with headphones and be the only thing I take out with me.
A few random notes about things that can make life on macOS (the modern one, as in, circa 2023) better for those coming from Linux.
For various reasons you may end up with Mac hardware with macOS on the metal rather than Linux. This could be anything from battery life of the Apple Silicon machines (and not quite being ready to jump on the Asahi Linux bandwagon), to being able to run the corporate suite of Enterprise Software (arguably a bug more than a feature), to some other reason that is also fine.
My approach to most of my development is to have a remote more powerful Linux machine to do the heavy lifting, or do Linux development on Linux, and not bank on messing around with a bunch of software on macOS that would approximate something on Linux. This also means I can move my GUI environment (the Mac) easily forward without worrying about whatever weird workarounds I needed to do in order to get things going for whatever development work I’m doing, and vice-versa.
Terminal emulator? iTerm2. The built in Terminal.app is fine, but there’s more than a few nice things in iTerm2, including tmuxintegration which can end up making it feel a lot more like a regular Linux machine. I should probably go read the tmux integration best practices before I complain about some random bugs I think I’ve hit, so let’s pretend I did that and everything is perfect.
I tend to use the Mac for SSHing to bigger Linux machines for most of my work. At work, that’s mostly to a Graviton 2 EC2 Instance running Amazon Linux with all my development environments on it. At home, it’s mostly a Raptor Blackbird POWER9 system running Fedora.
Running Linux locally? For all the use cases of containers, Podman Desktop or finch. There’s a GUI part of Podman which is nice, and finch I know about because of the relatively nearby team that works on it, and its relationship to lima. Lima positions itself as WSL2-like but for Mac. There’s UTM for a full virtual machine / qemu environment, although I rarely end up using this and am more commonly using a container or just SSHing to a bigger Linux box.
There’s XCode for any macOS development that may be needed (e.g. when you want that extra feature in UTM or something) I do use Homebrew to install a few things locally.
Last week I had occasion to test deploying ceph-csi on a k3s cluster, so that Kubernetes workloads could access block storage provided by an external Ceph cluster. I went with the upstream Ceph documentation, because assuming everything worked it’d then be really easy for me to say to others “just go do this”.
Everything did not work.
I’d gone through all the instructions, inserting my own Ceph cluster’s FSID and MON IP addresses in the right places, applied the YAML to deploy the provisioner and node plugins, and all the provisioner bits were running just fine, but the csi-rbdplugin pods were stuck in CrashLoopBackOff:
The csi-rbdplugin pod consists of three containers – driver-registrar, csi-rbdplugin, liveness-prometheus – and csi-rbdplugin wasn’t able to load the rbd kernel module:
> kubectl logs csi-rbdplugin-22zjr --container csi-rbdplugin
I0726 10:25:12.862125 7628 cephcsi.go:199] Driver version: canary and Git version: d432421a88238a878a470d54cbf2c50f2e61cdda
I0726 10:25:12.862452 7628 cephcsi.go:231] Starting driver type: rbd with name: rbd.csi.ceph.com
I0726 10:25:12.865907 7628 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
E0726 10:25:12.872477 7628 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Key was rejected by service\n"
F0726 10:25:12.872702 7628 driver.go:150] an error (exit status 1) occurred while running modprobe args: [rbd]
Matching “modprobe: ERROR: could not insert ‘rbd’: Key was rejected by service” in the above was an error on each host’s console: “Loading of unsigned module is rejected”. These hosts all have secure boot enabled, so I figured it had to be something to do with that. So I logged into one of the hosts and ran modprobe rbd as root, but that worked just fine. No key errors, no unsigned module errors. And once I’d run modprobe rbd (and later modprobe nbd) on the host, the csi-rbdplugin container restarted and worked just fine.
So why wouldn’t modprobe work inside the container? /lib/modules from the host is mounted inside the container, the container has the right extra privileges… Clearly I needed to run a shell in the failing container to poke around inside when it was in CrashLoopBackOff state, but I realised I had no idea how to do that. I knew I could kubectl exec -it csi-rbdplugin-22zjr --container csi-rbdplugin -- /bin/bash but of course that only works if the container is actually running. My container wouldn’t even start because of that modprobe error.
Having previously spent a reasonable amount of time with podman, which has podman run, I wondered if there were a kubectl run that would let me start a new container using the upstream cephcsi image, but running a shell, instead of its default command. Happily, there is a kubectl run, so I tried it:
> kubectl run -it cephcsi --image=quay.io/cephcsi/cephcsi:canary --rm=true --command=true -- /bin/bash
If you don't see a command prompt, try pressing enter.
[root@cephcsi /]# modprobe rbd
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.14.21-150400.24.66-default
[root@cephcsi /]# ls /lib/modules/
[root@cephcsi /]#
Ohhh, right, of course, that doesn’t have the host’s /lib/modules mounted. podman run lets me add volume mounts using -v options , so surely kubectl run will let me do that too.
At this point in the story, the notes I wrote last week include an awful lot of swearing.
See, kubectl run doesn’t have a -v option to add mounts, but what it does have is an --overrides option to let you add a chunk of JSON to override the generated pod. So I went back to the relevant YAML and teased out the bits I needed to come up with this monstrosity:
But at least I could get a shell and reproduce the problem:
> kubectl run -it cephcsi-test [honking great horrible chunk of JSON]
[root@cephcsi-test /]# ls /lib/modules/
5.14.21-150400.24.66-default
[root@cephcsi-test /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service
A certain amount more screwing around looking at the source for modprobe and bits of the kernel confirmed that the kernel really didn’t think the module was signed for some reason (mod_verify_sig() was returning -ENODATA), but I knew these modules were fine, because I could load them on the host. Eventually I hit on this:
[root@cephcsi-test /]# ls /lib/modules/*/kernel/drivers/block/rbd*
/lib/modules/5.14.21-150400.24.66-default/kernel/drivers/block/rbd.ko.zst
Wait, what’s that .zst extension? It turns out we (SUSE) have been shipping zstd-compressed kernel modules since – as best as I can tell – some time in 2021. modprobe on my SLE Micro 5.3 host of course supports this:
# grep PRETTY /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Micro for Rancher 5.3"
# modprobe --version
kmod version 29
+ZSTD +XZ +ZLIB +LIBCRYPTO -EXPERIMENTAL
modprobe in the CentOS Stream 8 upstream cephcsi container does not:
Mystery solved, but I have to say the error messages presented were spectacularly misleading. I later tried with secure boot disabled, and got something marginally better – in that case modprobe failed with “modprobe: ERROR: could not insert ‘rbd’: Exec format error”, and dmesg on the host gave me “Invalid ELF header magic: != \x7fELF”. If I’d seen messaging like that in the first place I might have been quicker to twig to the compression thing.
Anyway, the point of this post wasn’t to rant about inscrutable kernel errors, it was to rant about how there’s no way anyone could be reasonably expected to figure out how to do that --overrides thing with the JSON to debug a container stuck in CrashLoopBackOff. Assuming I couldn’t possibly be the first person to need to debug containers in this state, I told my story to some colleagues, a couple of whom said (approximately) “Oh, I edit the pod YAML and change the container’s command to tail -f /dev/null or sleep 1d. Then it starts up just fine and I can kubectl exec into it and mess around”. Those things totally work, and I wish I’d thought to do that myself. The best answer I got though was to use kubectl debug to make a copy of the existing pod but with the command changed. I didn’t even know kubectl debug existed, which I guess is my reward for not reading the entire manual
So, finally, here’s the right way to do what I was trying to do:
> kubectl debug csi-rbdplugin-22zjr -it \
--copy-to=csi-debug --container=csi-rbdplugin -- /bin/bash
[root@... /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service
(...do whatever other messing around you need to do, then...)
[root@... /]# exit
Session ended, resume using 'kubectl attach csi-debug -c csi-rbdplugin -i -t' command when the pod is running
> kubectl delete pod csi-debug
pod "csi-debug" deleted
In the above kubectl debug invocation, csi-rbdplugin-22zjr is the existing pod that’s stuck in CrashLoopBackOff, csi-debug is the name of the new pod being created, and csi-rbdplugin is the container in that pod that has its command replaced with /bin/bash, so you can mess around inside it.
The July 2023 meeting sparked multiple new topics including Linux security architecture, Debian ports of LoongArch and Risc-V as well as hardware design of PinePhone backplates.
On the practical side, Russell Coker demonstrated running different applications in isolated environment with bubblewrap sandbox, as well as other hardening techniques and the way they interact with the host system. Russell also discussed some possible pathways of hardening desktop Linux to reach the security level of modern Android. Yifei Zhan demonstrated sending and receiving messages with the PineDio USB LoRa adapter and how to inspect LoRa signal with off-the-shelf software defined radio receiver, and discussed how the driver situation for LoRa on Linux might be improved. Yifei then gave a demonstration on utilizing KVM on PinePhone Pro to run NetBSD and OpenBSD virtual machines, more details on running VMs on the PinePhone Pro can be found on this blog post from Yifei.
We also had some discussion of the current state of Mobian and Debian ecosystem, along with how to contribute to different parts of Mobian with a Mobian developer who joined us.
I’ve had a pretty varied experience with photo management on Linux over the past couple of decades. For a while I used f-spot as it was the new hotness. At some point this became…. slow and crashy enough that it was unusable. Today, it appears that the GitHub project warns that current bugs include “Not starting”.
At some point (and via a method I have long since forgotten), I did manage to finally get my photos over to Shotwell, which was the new hotness at the time. That data migration was so long ago now I actually forget what features I was missing from f-spot that I was grumbling about. I remember the import being annoying though. At some point in time Shotwell was no longer was the new hotness and now there is GNOME Photos. I remember looking at GNOME Photos, and seeing no method of importing photos from Shotwell, so put it aside. Hopefully that situation has improved somewhere.
At some point Shotwell was becoming rather stagnated, and I noticed more things stopping to work rather than getting added features and performance. The good news is that there has been some more development activity on Shotwell, so hopefully my issues with it end up being resolved.
One recommendation for Linux photo management was digiKam, and one that I never ended up using full time. One of the reasons behind that was that I couldn’t really see any non manual way to import photos from Shotwell into it.
With tens of thousands of photos (~58k at the time of writing), doing things manually didn’t seem like much fun at all.
As I postponed my decision, I ended up moving my main machine over to a Mac for a variety of random reasons, and one quite motivating thing was the ability to have Photos from my iPhone magically sync over to my photo library without having to plug it into my computer and copy things across.
So…. how to get photos across from Shotwell on Linux to Photos on a Mac/iPhone (and also keep a very keen eye on how to do it the other way around, because, well, vendor lock-in isn’t great).
It would be kind of neat if I could just run Shotwell on the Mac and have some kind of import button, but seeing as there wasn’t already a native Mac port, and that Shotwell is written in Vala rather than something I know has a working toolchain on macOS…. this seemed like more work than I’d really like to take on.
Luckily, I remembered that Shotwell’s database is actually just a SQLite database pointing to all the files on disk. So, if I could work out how to read it accurately, and how to import all the relevant metadata (such as what Albums a photo is in, tags, title, and description) into Apple Photos, I’d be able to make it work.
So… is there any useful documentation as to how the database is structured?
Semi annoyingly, Shotwell is written in Vala, a rather niche programming language that while integrating with all the GObject stuff that GNOME uses, is largely unheard of. Luckily, the database code in Shotwell isn’t too hard to read, so was a useful fallback for when the documentation proves inadequate.
Programming the Mac side of things, it was a good excuse to start looking at Swift, so knowing I’d also need to read a SQLite database directly (rather than use any higher level abstraction), I armed myself with the following resources:
From here, I could work on getting the first half going, the ability to view my Shotwell database on the Mac (which is what I posted a screenshot of back in Feb 2022).
But also, I had to work out what I was doing on the other end of things, how would I import photos? It turns out there’s an API!
A bit of SwiftUI code:
import SwiftUI
import AppKit
import Photos
struct ContentView: View {
@State var favorite_checked : Bool = false
@State var hidden_checked : Bool = false
var body: some View {
VStack() {
Text("Select a photo for import")
Toggle("Favorite", isOn: $favorite_checked)
Toggle("Hidden", isOn: $hidden_checked)
Button("Import Photo")
{
let panel = NSOpenPanel()
panel.allowsMultipleSelection = false
panel.canChooseDirectories = false
if panel.runModal() == .OK {
let photo_url = panel.url!
print("selected: " + String(photo_url.absoluteString))
addAsset(url: photo_url, isFavorite: favorite_checked, isHidden: hidden_checked)
}
}
.padding()
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
Combined with a bit of code to do the import (which does look a bunch like the examples in the docs):
import SwiftUI
import Photos
import AppKit
@main
struct SinglePhotoImporterApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
func addAsset(url: URL, isFavorite: Bool, isHidden: Bool) {
// Add the asset to the photo library.
let path = "/Users/stewart/Pictures/1970/01/01/1415446258647.jpg"
let url = URL(fileURLWithPath: path)
PHPhotoLibrary.shared().performChanges({
let addedImage = PHAssetChangeRequest.creationRequestForAssetFromImage(atFileURL: url)
addedImage?.isHidden = isHidden
addedImage?.isFavorite = isFavorite
}, completionHandler: {success, error in
if !success { print("Error creating the asset: \(String(describing: error))") } else
{
print("Imported!")
}
})
}
This all meant I could import a single photo. However, there were some limitations.
There’s the PHAssetCollectionChangeRequest to do things to Albums, so it would solve that problem, but I couldn’t for the life of me work out how to add/edit Titles and Descriptions.
It was so close!
So what did I need to do in order to import Titles and Descriptions? It turns out you can do that via AppleScript. Yes, that thing that launched in 1993 and has somehow survived the transition of m68k based Macs to PowerPC based Macs to Intel based Macs to ARM based Macs.
The Photos dictionary for AppleScript
So, just to make it easier to debug what was going on, I started adding code to my ShotwellImporter tool that would generate snippets of AppleScript I could run and check that it was doing the right thing…. but then very quickly ran into a problem…. it appears that the AppleScript language interpreter on modern macOS has limits that you’d be more familiar with in 1993 than 2023, and I very quickly hit limits where the script would just error out before running (I was out of dictionary size allegedly).
But there’s a new option! Everything you can do with AppleScript you can now do with JavaScript – it’s just even less documented than AppleScript is! But it does work! I got to the point where I could generate JavaScript that imported photos, into all the relevant albums, and set title and descriptions.
In my last post, I wrote about how I taught sesdev (originally a tool for deploying Ceph clusters on virtual machines) to deploy k3s, because I wanted a little sandbox in which I could break learn more about Kubernetes. It’s nice to be able to do a toy deployment locally, on a bunch of VMs, on my own hardware, in my home office, rather than paying to do it on someone else’s computer. Given the k3s thing worked, I figured the next step was to teach sesdev how to deploy Longhorn so I could break that learn more about that too.
Install nfs-client, open-iscsi and e2fsprogs packages on all nodes.
Make an ext4 filesystem on /dev/vdb on all the nodes that have extra disks, then mount that on /var/lib/longhorn.
Use kubectl label node -l 'node-role.kubernetes.io/master!=true' node.longhorn.io/create-default-disk=true to ensure Longhorn does its storage thing only on the nodes that aren’t the k3s master.
Install Longhorn with Helm, because that will install the latest version by default vs. using kubectl where you always explicitly need to specify the version.
Create an ingress so the UI is exposed… from all nodes, via HTTP, with no authentication. Remember: this is a sandbox – please don’t do this sort of thing in production!
So, now I can do this:
> sesdev create k3s --deploy-longhorn
=== Creating deployment "k3s-longhorn" with the following configuration ===
Deployment-wide parameters (applicable to all VMs in deployment):
- deployment ID: k3s-longhorn
- number of VMs: 5
- version: k3s
- OS: tumbleweed
- public network: 10.20.78.0/24
Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y
=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider…
Bringing machine 'node1' up with 'libvirt' provider…
Bringing machine 'node2' up with 'libvirt' provider…
Bringing machine 'node3' up with 'libvirt' provider…
Bringing machine 'node4' up with 'libvirt' provider…
[... lots more log noise here - this takes several minutes... ]
=== Deployment Finished ===
You can login into the cluster with:
$ sesdev ssh k3s-longhorn
Longhorn will now be deploying, which may take some time.
After logging into the cluster, try these:
# kubectl get pods -n longhorn-system --watch
# kubectl get pods -n longhorn-system
The Longhorn UI will be accessible via any cluster IP address
(see the kubectl -n longhorn-system get ingress output above).
Note that no authentication is required.
…and, after another minute or two, I can access the Longhorn UI and try creating some volumes. There’s a brief period while the UI pod is still starting where it just says “404 page not found”, and later after the UI is up, there’s still other pods coming online, so on the Volume screen in the Longhorn UI an error appears: “failed to get the parameters: failed to get target node ID: cannot find a node that is ready and has the default engine image longhornio/longhorn-engine:v1.4.1 deployed“. Rest assured thisgoes away in due course (it’s not impossible I’m suffering here from rural Tasmanian internet lag pulling container images). Anyway, with my five nodes – four of which have an 8GB virtual disk for use by Longhorn – I end up with a bit less than 22GB storage available:
21.5 GiB isn’t much, but remember this is a toy deployment running in VMs on my desktop Linux box
Now for the fun part. Longhorn is a distributed storage solution, so I thought it would be interesting to see how it handled a couple of types of failure. The following tests are somewhat arbitrary (I’m really just kicking the tyres randomly at this stage) but Longhorn did, I think, behave pretty well given what I did to it.
Volumes in Longhorn consist of replicas stored as sparse files on a regular filesystem on each storage node. The Longhorn documentation recommends using a dedicated disk rather than just having /var/lib/longhorn backed by the root filesystem, so that’s what sesdev does: /var/lib/longhorn is an ext4 filesystem mounted on /dev/vdb. Now, what happens to Longhorn if that underlying block device suffers some kind of horrible failure? To test that, I used the Longhorn UI to create a 2GB volume, then attached that to the master node:
The Longhorn UI helpfully tells me the volume replicas are on node3, node4 and node1
Then, I ssh’d to the master node and with my 2GB Longhorn volume attached, made a filesystem on it and created a little file:
> sesdev ssh k3s-longhorn
Have a lot of fun...
master:~ # cat /proc/partitions
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
8 0 2097152 sda
master:~ # mkfs /dev/sda
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 3709b21c-b9a2-41c1-a6dd-e449bdeb275b
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
master:~ # mount /dev/sda /mnt
master:~ # echo foo > /mnt/foo
master:~ # cat /mnt/foo
foo
Then I went and trashed the block device backing one of the replicas:
> sesdev ssh k3s-longhorn node3
Have a lot of fun...
node3:~ # ls /var/lib/longhorn
engine-binaries longhorn-disk.cfg lost+found replicas unix-domain-socket
node3:~ # dd if=/dev/urandom of=/dev/vdb bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.486205 s, 216 MB/s
node3:~ # ls /var/lib/longhorn
node3:~ # dmesg|tail -n1
[ 6544.197183] EXT4-fs error (device vdb): ext4_map_blocks:607: inode #393220: block 1607168: comm longhorn: lblock 0 mapped to illegal pblock 1607168 (length 1)
At this point, the Longhorn UI still showed the volume as green (healthy, ready, scheduled). Then, back on the master node, I tried creating another file:
master:~ # echo bar > /mnt/bar
master:~ # cat /mnt/bar
bar
That’s fine so far, but suddenly the Longhorn UI noticed that something very bad had happened:
The volume is still usable, but one of the replicas has failed
Ultimately node3 was rebooted and ended up stalled with the console requesting the root password for maintenance:
Failed to mount /var/lib/longhorn – Can’t find ext4 filesystem
Meanwhile, Longhorn went and rebuilt a third replica on node2:
All green again!
…and the volume remained usable the entire time:
master:~ # echo baz > /mnt/baz
master:~ # ls /mnt
bar baz foo lost+found
That’s perfect!
Looking at the Node screen we could see that node3 was still down:
There may be disk size errors with down nodes (4.87 TiB looks a lot like integer overflow to me)
That’s OK, I was able to fix node3. I logged in on the console and ran mkfs.ext4 /dev/vdb then brought the node back up again.The disk remained unschedulable, because Longhorn was still expecting the ‘old’ disk to be there (I assume based on the UUID stored in /var/lib/longhorn/longhorn-disk.cfg) and of course the ‘new’ disk is empty. So I used the Longhorn UI to disable scheduling for that ‘old’ disk, then deleted it. Shortly after, Longhorn recognised the ‘new’ disk mounted at /var/lib/longhorn and everything was back to green across the board.
So Longhorn recovered well from the backing store of one replica going bad. Next I thought I’d try to break it from the other end by running a volume out of space. What follows is possibly not a fair test, because what I did was create a single Longhorn volume larger than the underlying disks, then filled that up. In normal usage, I assume one would ensure there’s plenty of backing storage available to service multiple volumes, that individual volumes wouldn’t generally be expected to get more than a certain percentage full, and that some sort of monitoring and/or alerting would be in place to warn of disk pressure.
With four nodes, each with a single 8GB disk, and Longhorn apparently reserving 2.33GB by default on each disk, that means no Longhorn volume can physically store more than a bit over 5.5GB of data (see the Size column in the previous screenshot). Given that the default setting for Storage Over Provisioning Percentage is 200, we’re actually allowed to allocate up to a bit under 11GB.
So I went and created a 10GB volume, attached that to the master node, created a filesystem on it, and wrote a whole lot of zeros to it:
…there was a lot of unpleasantness on the master node’s console…
So many I/O errors!
…the replicas became unschedulable due to lack of space…
This doesn’t look good
…and finally the volume faulted:
This really doesn’t look good
Now what?
It turns out that Longhorn will actually recover if we’re able to somehow expand the disks that store the replicas. This is probably a good argument for backing Longhorn with an LVM volume on each node in real world deployments, because then you could just add another disk and extend the volume onto it. In my case though, given it’s all VMs and virtual block devices, I can actually just enlarge those devices. For each node then, I:
Shut it down
Ran qemu-img resize /var/lib/libvirt/images/k3s-longhorn_$NODE-vdb.qcow2 +8G
Started it back up again and ran resize2fs /dev/vdb to take advantage of the extra disk space.
After doing that to node1, Longhorn realised there was enough space there and brought node1’s replica of my 10GB volume back online. It also summarily discarded the other two replicas from the still-full disks on node2 and node3, which didn’t yet have enough free space to be useful:
One usable replica is better than three unusable replicas
As I repeated the virtual disk expansion on the other nodes, Longhorn happily went off and recreated the missing replicas:
Finally I could re-attach the volume to the master node, and have a look to see how many of my zeros were actually written to the volume:
master:~ # cat /proc/partitions
major minor #blocks name
254 0 44040192 vda
254 1 2048 vda1
254 2 20480 vda2
254 3 44016623 vda3
8 0 10485760 sda
master:~ # mount /dev/sda /mnt
master:~ # ls -l /mnt
total 7839764
-rw-r--r-- 1 root root 8027897856 May 3 04:41 big-lot-of-zeros
drwx------ 2 root root 16384 May 3 04:34 lost+found
Recall that dd claimed to have written 9039773696 bytes before it stalled when the volume faulted, so I guess that last gigabyte of zeros is lost in the aether. But, recall also that this isn’t really a fair test – one overprovisioned volume deliberately being quickly and deliberately filled to breaking point vs. a production deployment with (presumably) multiple volumes that don’t fill quite so fast, and where one is hopefully paying at least a little bit of attention to disk pressure as time goes by.
It’s worth noting that in a situation where there are multiple Longhorn volumes, assuming one disk or LVM volume per node, the replicas will all share the same underlying disks, and once those disks are full it seems all the Longhorn volumes backed by them will fault. Given multiple Longhorn volumes, one solution – rather than expanding the underlying disks – is simply to delete a volume or two if you can stand to lose the data, or maybe delete some snapshots (I didn’t try the latter yet). Once there’s enough free space, the remaining volumes will come back online. If you’re really worried about this failure mode, you could always just disable overprovisioning in the first place – whether this makes sense or not will really depend on your workloads and their data usage patterns.
All in all, like I said earlier, I think Longhorn behaved pretty well given what I did to it. Some more information in the event log could perhaps be beneficial though. In the UI I can see warnings from longhorn-node-controller e.g. “the disk default-disk-1cdbc4e904539d26(/var/lib/longhorn/) on the node node1 has 3879731200 available, but requires reserved 2505089433, minimal 25% to schedule more replicas” and warnings from longhorn-engine-controller e.g. “Detected replica overprovisioned-r-73d18ad6 (10.42.3.19:10000) in error“, but I couldn’t find anything really obvious like “Dude, your disks are totally full!”
Later, I found more detail in the engine manager logs after generating a support bundle ([…] level=error msg=”I/O error” error=”tcp://10.42.4.34:10000: write /host/var/lib/longhorn/replicas/overprovisioned-c3b9b547/volume-head-003.img: no space left on device”) so the error information is available – maybe it’s just a matter of learning where to look for it.
We – that is to say the storage team at SUSE – have a tool we’ve been using for the past few years to help with development and testing of Ceph on SUSE Linux. It’s called sesdev because it was created largely for SES (SUSE Enterprise Storage) development. It’s essentially a wrapper around vagrant and libvirt that will spin up clusters of VMs running openSUSE or SLES, then deploy Ceph on them. You would never use such clusters in production, but it’s really nice to be able to easily spin up a cluster for testing purposes that behaves something like a real cluster would, then throw it away when you’re done.
I’ve recently been trying to spend more time playing with Kubernetes, which means I wanted to be able to spin up clusters of VMs running openSUSE or SLES, then deploy Kubernetes on them, then throw the clusters away when I was done, or when I broke something horribly and wanted to start over. Yes, I know there’s a bunch of other tools for doing toy Kubernetes deployments (minikube comes to mind), but given I already had sesdev and was pretty familiar with it, I thought it’d be worthwhile seeing if I could teach it to deploy k3s, a particularly lightweight version of Kubernetes. Turns out that wasn’t too difficult, so now I can do this:
> sesdev create k3s
=== Creating deployment "k3s" with the following configuration ===
Deployment-wide parameters (applicable to all VMs in deployment):
deployment ID: k3s
number of VMs: 5
version: k3s
OS: tumbleweed
public network: 10.20.190.0/24
Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y
=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider...
Bringing machine 'node1' up with 'libvirt' provider...
Bringing machine 'node2' up with 'libvirt' provider...
Bringing machine 'node3' up with 'libvirt' provider...
Bringing machine 'node4' up with 'libvirt' provider...
[...wait a few minutes(there's lots more log information output here in real life)
...]
=== Deployment Finished ===
You can login into the cluster with:
$ sesdev ssh k3s
…and then I can do this:
> sesdev ssh k3s
Last login: Fri Mar 24 11:50:15 CET 2023 from 10.20.190.204 on ssh
Have a lot of fun…
master:~ # kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 5m16s v1.25.7+k3s1
node2 Ready 2m17s v1.25.7+k3s1
node1 Ready 2m15s v1.25.7+k3s1
node3 Ready 2m16s v1.25.7+k3s1
node4 Ready 2m16s v1.25.7+k3s1
master:~ # kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-79f67d76f8-rpj4d 1/1 Running 0 5m9s
kube-system metrics-server-5f9f776df5-rsqhb 1/1 Running 0 5m9s
kube-system coredns-597584b69b-xh4p7 1/1 Running 0 5m9s
kube-system helm-install-traefik-crd-zz2ld 0/1 Completed 0 5m10s
kube-system helm-install-traefik-ckdsr 0/1 Completed 1 5m10s
kube-system svclb-traefik-952808e4-5txd7 2/2 Running 0 3m55s
kube-system traefik-66c46d954f-pgnv8 1/1 Running 0 3m55s
kube-system svclb-traefik-952808e4-dkkp6 2/2 Running 0 2m25s
kube-system svclb-traefik-952808e4-7wk6l 2/2 Running 0 2m13s
kube-system svclb-traefik-952808e4-chmbx 2/2 Running 0 2m14s
kube-system svclb-traefik-952808e4-k7hrw 2/2 Running 0 2m14s
…and then I can make a mess with kubectl apply, helm, etc.
One thing that sesdev knows how to do is deploy VMs with extra virtual disks. This functionality is there for Ceph deployments, but there’s no reason we can’t turn it on when deploying k3s:
> sesdev create k3s --num-disks=2
> sesdev ssh k3s
master:~ # for node in \
$(kubectl get nodes -o 'jsonpath={.items[*].metadata.name}') ;
do echo $node ; ssh $node cat /proc/partitions ; done
master
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
node3
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node2
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node4
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
node1
major minor #blocks name
253 0 44040192 vda
253 1 2048 vda1
253 2 20480 vda2
253 3 44016623 vda3
253 16 8388608 vdb
253 32 8388608 vdc
As you can see this gives all the worker nodes an extra two 8GB virtual disks. I suspect this may make sesdev an interesting tool for testing other Kubernetes based storage systems such as Longhorn, but I haven’t tried that yet.
I recently bought an
Energica
Experia - the latest, largest and longest distance of Energica's
electric motorbike models.
The decision to do this rather than build my own was complicated, and I'm
going to mostly skip over the detail of that. At some time I might put it in
another blog post. But for now it's enough to say that I'd accidentally
cooked the motor in my Mark I, the work on the Mark II was going to take ages,
and I was in the relatively fortunate situation of being able to afford the
Experia if I sold my existing Triumph Tiger Sport and the parts for the Mark
II.
For other complicated reasons I was planning to be in Sydney after the weekend
that Bruce at Zen Motorcycles told
me the bike would be arriving. Rather than have it freighted down, and since
I would have room for my riding gear in our car, I decided to pick it up and
ride it back on the Monday. In reconnoitering the route, we discovered that
by pure coincidence Zen Motorcycles is on Euston Road in Alexandria, only
200 metres away from the entrance to WestConnex and the M8. So with one
traffic light I could be out of Sydney.
I will admit to being more than a little excited that morning. Electric
vehicles are still, in 2023, a rare enough commodity that waiting lists can be
months long; I ordered this bike in October 2022 and it arrived in March 2023.
So I'd had plenty of time to build my expectations. And likewise the thought
of riding a brand new bike - literally one of the first of its kind in the
country (it is the thirty-second Experia ever made!) - was a little daunting.
I obtained PDF copies of the manual and familiarised myself with turning the
cruise control on and off, as well as checking and setting the regen braking
levels. Didn't want to stuff anything up on the way home.
There is that weird feeling in those situations of things being both very
ordinary and completely unique. I met Bruce, we chatted, I saw the other
Experia models in the store, met Ed - who had come down to chat with Bruce,
and just happened to be the guy who rode a Harley Davidson Livewire from
Perth to Sydney and then from Sydney to Cape Tribulation and back. He shared
stories from his trip and tips on hypermiling. I signed paperwork, picked up
the keys, put on my gear, prepared myself.
Even now I still get a bit choked up just thinking of that moment. Seeing
that bike there, physically real, in front of me - after those months of
anticipation - made the excitement real as well.
So finally, after making sure I wasn't floating, and making sure I had my
ear plugs in and helmet on the right way round, I got on. Felt the bike's
weight. Turned it on. Prepared myself. Took off. My partner followed
behind, through the lights, onto the M8 toward Canberra. I gave her the
thumbs up.
We planned to stop for lunch at Mittagong, while the NRMA still offers the
free charger at the RSL there. One lady was charging her Nissan Leaf on the
ChaDeMo side; shortly after I plugged in a guy arrived in his Volvo XC40
Recharge. He had the bigger battery and would take longer; I just needed a
ten minute top up to get me to Marulan.
I got to Marulan and plugged in; a guy came thinking he needed to tell the
petrol motorbike not to park in the electric vehicle bay, but then realised
that the plug was going into my bike. Kate headed off, having charged up as
well, and I waited another ten minutes or so to get a bit more charge. Then
I rode back.
I stopped, only once more - at Mac's Reef Road. I turned off and did a U
turn, then waited for the traffic to clear before trying the bike's
acceleration. Believe me when I say this bike will absolutely do a 0-100km/hr
in under four seconds! It is not a light bike, but when you pull on the power
it gets up and goes.
Here is my basic review, given that experience and then having ridden it for
about ten weeks around town.
The absolute best feature of the Energica Experia is that it is perfectly
comfortable riding around town. Ease on the throttle and it gently takes off
at the traffic lights and keeps pace with the traffic. Ease off, and it
gently comes to rest with regenerative braking and a light touch on the rear
brake after stopping to hold it still. If you want to take off faster, wind
the throttle on more. It is not temperamental or twitchy, and you have no
annoying gears and clutch to balance.
In fact, I feel much more confident lane filtering, because before I would
have to have the clutch ready and be prepared to give the Tiger Sport lots of
throttle lest I accidentally stall it in front of an irate line of traffic.
With the Experia, I can simply wait peacefully - using no power - and then
when the light goes green I simply twist on the throttle and I am away ahead
of even the most aggressive car driver.
It is amazingly empowering.
I'm not going to bore you with the stats - you can probably look them up
yourself if you care. The main thing to me is that it has DC fast charging,
and watching 75KW go into a 22.5KWHr battery is just a little bit terrifying
as well as incredibly cool. The stated range of 250km on a charge at highway
speeds is absolutely correct, from my experience riding it down from Sydney.
And that plus the fast charging means that I think it is going to be quite
reasonable to tour on this bike, stopping off at fast or even mid-level
chargers - even a boring 22KW charger can fill the battery up in an hour.
The touring group I travel with stops often enough that if those stops can be
top ups, I will not hold anyone up.
Some time in the near future I hope to have a nice fine day where I can take
it out on the Cotter Loop. This is an 80km stretch of road that goes west of
Canberra into the foothills of the Brindabella Ranges, out past the Deep
Space Tracking Station and Tidbinbilla Nature Reserve. It's a great
combination of curving country roads and hilly terrain, and reasonably well
maintained as well. I did that on the Tiger Sport, with a GoPro, before I
sold it - and if I can ever convince PiTiVi to actually compile the video
from it I will put that hour's ride up on a platform somewhere.
I want to do that as much to show off Canberra's scenery as to show off the
bike.
And if the CATL battery capacity improvement comes through to the rest of the
industry, and we get bikes that can do 400km to 500km on a charge, then
electric motorbike touring really will be no different to petrol motorbike
touring. The Experia is definitely at the forefront of that change, but it
is definitely possible on this bike.
Rustup (the community package manage for the Rust language) was starting to really suffer : CI times were up at ~ one hour.
We’ve made some strides in bringing this down.
Caching factory for test scenarios
The first thing, which achieved about a 30% reduction in test time was to stop recreating all the test context every time.
Rustup tests the download/installation/upgrade of distributions of Rust. To avoid downloading gigabytes in the test suite, the suite creates mocks of the published Rust artifacts. These mocks are GPG signed and compressed with multiple compression methods, both of which are quite heavyweight operations to perform – and not actually the interesting code under test to execute.
Previously, every test was entirely hermetic, and usually the server state was also unmodified.
There were two cases where the state was modified. One, a small number of tests testing error conditions such as GPG signature failures. And two, quite a number of tests that were testing temporal behaviour: for instance, install nightly at time A, then with a newer server state, perform a rustup update and check a new version is downloaded and installed.
We’re partway through this migration, but compare these two tests:
The former version mutates the date with set_current_dist_date; the new version uses two scenarios, one for the earlier time, and one for the later time. This permits the server state to be constructed only once. On a per-test basis it can move as much as 50% of the time out of the test.
Single binary for the integration test suite
The next major gain was moving from having 14 separate integration test binaries to just one. This reduces the link cost of linking the test binaries, all of which link in the same library. It also permits us to see unused functions in our test support library, which helps with cleaning up cruft rather than having it accumulate.
Hard linking rather than copying ‘rustup-init’
Part of the test suite for each test is setting up an installed rustup environment. Why not start from scratch every time? Well, we obviously have tests that do that, but most tests are focused on steps beyond the new-user case. Setting up an installed rustup environment has a few steps, but particular ones are copying a binary of rustup into the test sandbox, and hard linking it under various names: cargo, rustc, rustup etc.
A debug build of rustup is ~20MB. Running 400 tests means about 8GB of IO; on some platforms most of that IO won’t hit disk, on others it will.
In review now is a PR that changes the initial copy to a hardlink: we hardlink the rustup-init built by cargo into each test, and then hardlink that to the various binaries. That saves 8GB of IO, which isn’t much from some perspectives, but it adds pressure on the page cache, and is wasted work. One wrinkle is a very low max-links limit on NTFS of 1023; to mitigate that we count the links made to rustup-init and generate a new inode for the original to avoid failures happening.
Future work
In GitHub actions this lowers our test time to 19m for Linux, 24m for Windows, which is a lot better but not great.
I plan on experimenting with separate actions for building release artifacts and doing CI tests – at the moment we have the same action do both, but they don’t share artifacts in the cache in any meaningful way, so we can probably gain parallelism there, as well as turning off release builds entirely for CI.
We should finish the cached test context work and use it everywhere.
Also we’re looking at having less integration tests and more narrow close to the code tests.
I have long said “Long Malaysians, Short Malaysia” in conversation to many. Maybe it took me a while to tweet it, but this was the first example: Dec 29, 2021. I’ve tweeted it a lot more since.
Malaysia has a 10th Prime Minister, but in general, it is a very precarious partnership. Consider it, same shit, different day?
5/n: Otherwise, there will be no change.
So change via “purported democracy” is never going to happen with a country like Malaysia, rotten to the core. It is a crazy dream.
I just have to get off the Malaysian news diet. Malaysians elsewhere, are generally very successful. Malaysians suffering by their daily doldrums, well, they just need to wake up, see the light, and succeed.
In the end, as much as people paraphrase, ask not what the country can do for you, legitimately, this is your life, and you should be taking good care of yourself and your loved ones. You succeed, despite of. Politics and the state happens, regardless of.
Me, personally? Ideas are abound for how to get Malaysians who see the light, to succeed elsewhere. And if I read, and get angry at something (tweet rage?), I’m going to pop RM50 into an investment account, which should help me get off this poor habit. I’ll probably also just cut subscriptions to Malaysian news things… Less exposure, is actually better for you. I can’t believe that it has taken me this long to realise this.
I did poorly blogging last year. Oops. I think to myself when I read, This Thing Still On?, I really have to do better in 2023. Maybe the catalyst is the fact that Twitter is becoming a shit show. I doubt people will leave the platform in droves, per se, but I think we are coming back to the need for decentralised blogs again.
I have 477 days to becoming 40. I ditched the Hobonich Techo sometime in 2022, and just focused on the Field Notes, and this year, I’ve got a Monocle x Leuchtturm1917 + Field Notes combo (though it seems my subscription lapsed Winter 2022, I should really burn down the existing collection, and resubscribe).
2022 was pretty amazing. Lots of work. Lots of fun. 256 days on the road (what a number), 339,551km travelled, 49 cities, 20 countries.
The getting back into doing, and not being afraid of experimenting in public is what 2023 is all about. The Year of The Rabbit is upon us tomorrow, hence why I don’t mind a little later Hello 2023 :)
Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.
I like using Catalyst Cloud to host some of my personal sites. In the past I used to use CAcert for my TLS certificates, but more recently I've been using Let's Encrypt for my TLS certificates as they're trusted in all browsers. Currently the LoadBalancer as a Service (LBaaS) in Catalyst Cloud doesn't have built in support for Let's Encrypt. I could use an apache2/nginx proxy and handle the TLS termination there and have that manage the Let's Encrypt lifecycle, but really, I'd rather use LBaaS.
So I thought I'd set about working out how to get Dehydrated (the Let's Encrypt client I've been using) to drive LBaaS (known as Octavia). I figured this would be of interest to other people using Octavia with OpenStack in general, not just Catalyst Cloud.
There's a few things you need to do. These instructions are specific to Debian:
Install and configure Dehydrated to create the certificates for the domain(s) you want.
apt install barbican
Create the LoadBalancer (use the API, ClickOps, whatever), just forward port 80 for now (see sample Apache configs below).
Save the sample hook.sh below to /etc/dehydrated/hook.sh, you'll probably need to customise it, mine is a bit more complicated!
Insert the UUID of your LoadBalancer in hook.sh where LB_LISTENER is set.
Create /etc/dehydrated/catalystcloud/password as described in hook.sh
Save OpenRC file from the Catalyst Cloud dashboard as /etc/dehydrated/catalystcloud/openrc.sh
Install jq, openssl and the openstack tools, on Debian this is:
You should be able to rename the latest certs /var/lib/dehydrated/certs/$DOMAIN and then run dehydrated -c to have it reissue and then deploy a cert.
As we're using HTTP-01 Challenge Type here, you need to have the LoadBalancer forwarding port 80 to your website to allow for the challenge response. It is good practice to have a redirect to HTTPS, here's an example virtual host for Apache:
You all also need this in /etc/apache2/conf-enabled/letsencrypt.conf:
Alias /.well-known/acme-challenge /var/lib/dehydrated/acme-challenges
<Directory /var/lib/dehydrated/acme-challenges>
Options None
AllowOverride None
# Apache 2.x
<IfModule !mod_authz_core.c>
Order allow,deny
Allow from all
</IfModule>
# Apache 2.4
<IfModule mod_authz_core.c>
Require all granted
</IfModule>
</Directory>
And that should be all that you need to do. Now, when Dehydrated updates your certificate, it should update your LoadBalancer as well!
Sample hook.sh:
deploy_cert() {
local DOMAIN="${1}" KEYFILE="${2}" CERTFILE="${3}" FULLCHAINFILE="${4}" \
CHAINFILE="${5}" TIMESTAMP="${6}"
shift 6
# File contents should be:
# export OS_PASSWORD='your password in here'
. /etc/dehydrated/catalystcloud/password
# OpenRC file from the Catalyst Cloud dashboard
. /etc/dehydrated/catalystcloud/openrc.sh --no-token
# UUID of the LoadBalancer to be managed
LB_LISTENER='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
# Barbican uses P12 files, we need to make one.
P12=$(readlink -f $KEYFILE \
| sed -E 's/privkey-([0-9]+)\.pem/barbican-\1.p12/')
openssl pkcs12 -export -inkey $KEYFILE -in $CERTFILE -certfile \
$FULLCHAINFILE -passout pass: -out $P12
# Keep track of existing certs for this domain (hopefully no more than 100)
EXISTING_URIS=$(openstack secret list --limit 100 \
-c Name -c 'Secret href' -f json \
| jq -r ".[]|select(.Name | startswith(\"$DOMAIN\"))|.\"Secret href\"")
# Upload the new cert
NOW=$(date +"%s")
openstack secret store --name $DOMAIN-$TIMESTAMP-$NOW -e base64 \
-t "application/octet-stream" --payload="$(base64 < $P12)"
NEW_URI=$(openstack secret list --name $DOMAIN-$TIMESTAMP-$NOW \
-c 'Secret href' -f value) \
|| unset NEW_URI
# Change LoadBalancer to use new cert - if the old one was the default,
# change the default. If the old one was in the SNI list, update the
# SNI list.
if [ -n "$EXISTING_URIS" ]; then
DEFAULT_CONTAINER=$(openstack loadbalancer listener show $LB_LISTENER \
-c default_tls_container_ref -f value)
for URI in $EXISTING_URIS; do
if [ "x$URI" = "x$DEFAULT_CONTAINER" ]; then
openstack loadbalancer listener set $LB_LISTENER \
--default-tls-container-ref $NEW_URI
fi
done
SNI_CONTAINERS=$(openstack loadbalancer listener show $LB_LISTENER \
-c sni_container_refs -f value | sed "s/'//g" | sed 's/^\[//' \
| sed 's/\]$//' | sed "s/,//g")
for URI in $EXISTING_URIS; do
if echo $SNI_CONTAINERS | grep -q $URI; then
SNI_CONTAINERS=$(echo $SNI_CONTAINERS | sed "s,$URI,$NEW_URI,")
openstack loadbalancer listener set $LB_LISTENER \
--sni-container-refs $SNI_CONTAINERS
fi
done
# Remove old certs
for URI in $EXISTING_URIS; do
openstack secret delete $URI
done
fi
}
HANDLER="$1"; shift
#if [[ "${HANDLER}" =~ ^(deploy_challenge|clean_challenge|sync_cert|deploy_cert|deploy_ocsp|unchanged_cert|invalid_challenge|request_failure|generate_csr|startup_hook|exit_hook)$ ]]; then
if [[ "${HANDLER}" =~ ^(deploy_cert)$ ]]; then
"$HANDLER" "$@"
fi
What is HCX? VMware HCX is an application mobility platform designed for simplifying application migration, workload rebalancing and business continuity across datacenters and clouds. VMware HCX was formerly known as Hybrid Cloud Extension and NSX Hybrid Connect.
GCVE HCX GCVE deploys the Enterprise version of HCX as part of the cost of the solution.
HCX Enterprise has the following benefits:
Hybrid Interconnect WAN Optimisation Bulk Migration, Live Migration and HCX Replication Assisted vMotion Cloud to cloud migration Disaster Protection KVM & Hyper-V to vSphere migrations Traffic Engineering Mobility Groups Mobility Optimised Networking Changeover scheduling Definitions Cold Migration
We have seen a lot of Google Cloud VMware Engine over the last few months and for the entire time we have used click-ops to provision new infrastructure, networks and VM’s. Now we are going to the next level and we will be using Terraform to manage our infrastructure as code so that it is version controlled and predictable.
Installing Terraform The first part of getting this working is installing Terraform on your local machine.
Picking up where we left off last month, let’s dive into disaster recovery and how to use Site Recovery Manager and Google Backup & Protect to DR into and within the cloud with GCVE.
But before we do, a quick advertisement:
If you are in Brisbane, Australia, I suggest coming to the awesome Google Infrastructure Group (GIG) which focuses on GCVE where on 04 July 2022 I will be presenting on Terraform in GCVE.
Let’s pick up where we left off from last months article and start setting up some of the features of GCVE, starting with Advanced Autoscaling.
What is Advanced Auto-Scaling? Advanced Autoscaling automatically expands or shrinks a private cloud based on CPU, memory and storage utilisation metrics.
GCVE monitors the cluster based on the metrics defined in the autoscale policy and decides to add or remove nodes automatically. Remember: GCVE is physical Dell Poweredge servers, not a container/VM running in Docker or on a hypervisor like VMware.
We’ve done this a number of times over the last decade, from OSDC to LCA. The idea is to provide a free psychologist or counsellor at an in-person conference. Attendees can do an anonymous booking by taking a stickynote (with the timeslot) from a signup sheet, and thus get a free appointment.
Many people find it difficult taking the first (very important) step towards getting professional help, and we’ve received good feedback that this approach indeed assists.
So far we’ve always focused on open source conferences. Now we’re moving into information security! First BrisSEC 2022 (Friday 29 April at the Hilton in Brisbane, QLD) and then AusCERT 2022 (10-13 May at the Star Hotel, Gold Coast QLD). The awesome and geek friendly Dr Carla Rogers will be at both events.
How does this get funded? Well, we’ve crowdfunded some, nudged sponsors, most mostly it gets picked up by the conference organisers (aka indirectly by the sponsors, mostly).
If you’re a conference organiser, or would like a particular upcoming conference to offer this service, do drop us a line and we’re happy to chase it up for you and help the organisers to make it happen. We know how to run that now.
In-person is best. But for virtual conferences, sure contact us as well.
The hack day didn’t go as well as I hoped, but didn’t go too badly. There was smaller attendance than hoped and the discussion was mostly about things other than FLOSS. But everyone who attended had fun and learned interesting things so generally I think it counts as a success. There was discussion on topics including military hardware, viruses (particularly Covid), rocketry, and literature. During the discussion one error in a Wikipedia page was discussed and hopefully we can get that fixed.
I think that everyone who attended will be interested in more such meetings. Overall I think this is a reasonable start to the Hack Day meetings, when I previously ran such meetings they often ended up being more social events than serious hacking events and that’s OK too.
One conclusion that we came to regarding meetings is that they should always be well announced in email and that the iCal file isn’t useful for everyone. Discussion continues on the best methods of announcing meetings but I anticipate that better email will get more attendance.
What is GCVE? Google Cloud VMware Engine, or GCVE, is a fully managed VMware hypervisor and associated management and networking components, (vSphere, NSX-T, vSAN and HCX) built on top of Google’s highly performant and scalable infrastructure with fully redundant and dedicated 100Gbps networking that provides 99.99% availability.
The solution is integrated into Google Cloud Platform, so businesses benefit from having full access to GCP services, native VPC networking, Cloud VPN or Interconnect as well as all the normal security features you expect from GCP.
The March 2022 meeting went reasonably well. Everyone seemed to have fun and learn useful things about computers. After 2 hours my Internet connection dropped out which stopped the people who were using VMs from doing the tutorial. Fortunately most people seemed ready for a break so we ended the meeting. The early and abrupt ending of the meeting was a disappointment but it wasn’t too bad, the meeting would probably only have gone for another half hour otherwise.
The BigBlueButton system was shown to be effective for training when one person got confused with the Debian package configuration options for Postfix and they were able to share the window with everyone else to get advice. I was also confused by that stage.
Future Meetings
The main feature of the meeting was training in setting up a mailserver with Postfix, here are the lecture notes for it [1]. The consensus at the end of the meeting was that people wanted more of that for the April meeting. So for the April meeting I will add to the Postfix Training to include SpamAssassin, SPF, DKIM, and DMARC. For the start of the next meeting instead of providing bare Debian installations for the VMs I’ll provide a basic Postfix/Dovecot setup so people can get straight into SpamAssassin etc.
For the May meeting training on SE Linux was requested.
Social Media
Towards the end of the meeting we discussed Matrix and federated social media. LUV has a Matrix server and I can give accounts to anyone who’s involved in FOSS in the Australia and New Zealand area. For Mastodon the NZOSS Mastodon server [2] seems like a good option. I have an account there to try Mastodon, my Mastodon address is @etbe@mastodon.nzoss.nz .
We are going to make Matrix a primary communication method for the Flounder group, the room is #flounder:luv.asn.au . My Matrix address is @etbe:luv.asn.au .
We also have a new URL for the blog and events. See the right sidebar for the link to the iCal file which can be connected to Google Calendar and most online calendaring systems.
We just had the first Flounder meeting which went well. Had some interesting discussion of storage technology, I learnt a few new things. Some people did the ZFS training and BTRFS training and we had lots of interesting discussion.
Andrew Pam gave a summary of new things in Linux and talked about the sites lwn.net, gamingonlinux.com, and cnx-software.com that he uses to find Linux news. One thing he talked about is the latest developments with SteamDeck which is driving Linux support in Steam games. The site protondb.com tracks Linux support in Steam games.
We had some discussion of BPF, for an introduction to that technology see the BPF lecture from LCA 2022.
Next Meeting
The next meeting (Saturday 5th of March 1PM Melbourne time) will focus on running your own mail server which is always of interest to people who are interested in system administration and which is probably of more interest than usual because of Google forcing companies with “a legacy G Suite subscription” to transition to a more expensive “Business family” offering.
I “recently” wrote about obtaining a new (to me, actually quite old) computer over in The Apple Power Macintosh 7200/120 PC Compatible (Part 1). This post is a bit of a detour, but may help others understand why some images they download from the internet don’t work.
Disk partitioning is (of course) a way to divide up a single disk into multiple volumes (partitions) for different uses. While the idea is similar, computer platforms over the ages have done this in a variety of different ways, with varying formats on disk, and varying limitations. The ones that you’re most likely to be familiar with are the MBR partitioning scheme (from the IBM PC), and the GPT partitioning scheme (common for UEFI systems such as the modern PC and Mac). One you’re less likely to be familiar with is the Apple Partition Map scheme.
The way all IBM PCs and compatibles worked from the introduction of MS-DOS 2.0 in 1983 until some time after 2005 was the Master Boot Record partitioning scheme. It was outrageously simple: of the first 512 byte sector of a disk, the first 446 bytes was for the bootstrapping code (the “boot sector”), the last 2 bytes were for the magic two bytes telling the BIOS this disk was bootable, and the other 64 bytes were four entries of 16 bytes, each describing a disk partition. The Wikipedia page is a good overview of what it all looks like. Since “four partitions should be enough for anybody” wasn’t going to last, DOS 3.2 introduced “extended partitions” which was just using one of those 4 partitions as another similar data structure that could point to more partitions.
In the 1980s (similar to today), the Macintosh was, of course, different. The Apple Partition Map is significantly more flexible than the MBR on PCs. For a start, you could have more than four partitions! You could actually have a lot more than four partitions, as the Apple Partition Map is a single 512-byte sector for each partition, and the partition map is itself a partition. Instead of being block 0 (like the MBR is), it actually starts at block 1, and is contiguous (The Driver Descriptor Record is what’s at block 0). So, once created, it’s hard to extend. Typically it’d be created as 64×512-byte entries, for 32kb… which turns out is actually about enough for anyone.
The Inside Macintosh reference on the SCSI Manager goes through more detail as to these structures. If you’re wondering what language all the coding examples are in, it’s Pascal – which was fairly popular for writing Macintosh applications in back in the day.
But the actual partition map isn’t the “interesting” part of all this (and yes, the quotation marks are significant here), because Macs are pretty darn finicky about what disks to boot off, which gets to be interesting if you’re trying to find a CD-ROM image on the internet from which to boot, and then use to install an Operating System from.
I never programmed a 1980s Macintosh actually in the 1980s. It was sometime in the early 1990s that I first experienced Microsoft Basic for the Macintosh. I’d previously (unknowingly at the time as it was branded Commodore) experienced Microsoft BASIC on the Commodore 16, Commodore 64, and even the Apple ][, but the Macintosh version was something else. It let you do some pretty neat things such as construct a GUI with largely the same amount of effort as it took to construct a Text based UI on the micros I was familiar with.
Okay, to be fair, I’d also dabbled in Microsoft QBasic that came bundled with MS-DOS of the era, which let you do a whole bunch of graphics – so you could theoretically construct a GUI with it. Something I did attempt to do. Programming on the Mac was so much easier to construct a GUI.
Of course, Microsoft Basic wasn’t the preferred way to program on the Macintosh. At that time it was largely Pascal, with C being something that also existed – but you were going to see Pascal in Inside Macintosh. It was probably somewhat fortuitous that I’d poked at Pascal a bit as something alternate to look at in the high school computing classes. I can only remember using TurboPascal on DOS systems and never actually writing Pascal on the Macintosh.
By the middle part of the 1990s though, I was firmly incompetently writing C on the Mac. No doubt the quality of my code increased after I’d done some university courses actually covering the language rather than the only practical way I had to attempt to write anything useful being looking at Inside Macintosh examples in Pascal and “C for Dummies” which was very not-Macintosh. Writing C on UNIX/Linux was a lot easier – everything was made for it, including Actual Documentation!
Anyway, in the early 2000s I ran MacOS X for a bit on my white iBook G3, and did a (very) small amount of any GUI / Project Builder (the precursor to Xcode) related development – instead largely focusing on command line / X11 things. The latest coolness being to use Objective-C to program applications (unless you were bringing over your Classic MacOS Carbon based application, then you could still write C). Enter some (incompetent) Objective-C coding!
Then Apple went to x86, so the hardware ceased being interesting, and I had no reason to poke at it even as a side effect of having hardware that could run the software stack. Enter a long-ass time of Debian, Ubuntu, and Fedora on laptops.
Come 2022 though, and (for reasons I should really write up), I’m poking at a Mac again and it’s now Swift as the preferred way to write apps. So, I’m (incompetently) hacking away at Swift code. I have to admit, it’s pretty nice. I’ve managed to be somewhat productive in a relative short amount of time, and all the affordances in the language gear towards the kind of safety that is a PITA when coding in C.
So this is my WIP utility to be able to import photos from a Shotwell database into the macOS Photos app:
There’s a lot of rough edges and unknowns left, including how to actually do the import (it looks like there’s going to be Swift code doing AppleScript things as the PhotoKit API is inadequate). But hey, some incompetent hacking in not too much time has a kind-of photo browser thing going on that feels pretty snappy.
Recently I read Michael Snoyman’s post on combining Axum, Hyper, Tonic and Tower. While his solution worked, it irked me – it seemed like there should be a much tighter solution possible.
I can deep dive into the code in a later post perhaps, but I think there are four points of difference. One, since the post was written Axum has started boxing its routes : so the enum dispatch approach taken, which delivers low overheads actually has no benefits today.
Two, while writing out the entire type by hand has some benefits, async code is much more pithy.
Thirdly, the code in the post is entirely generic, except the routing function itself.
And fourth, the outer Service<AddrStream> is an unnecessary layer to abstract over: given the similar constraints – the inner Service must take Request<..>, it is possible to just not use a couple of helpers and instead work directly with Service<Request...>.
So, onto a pithier version.
First, the app server code itself.
use std::{convert::Infallible, net::SocketAddr};
use axum::routing::get;
use hyper::{server::conn::AddrStream, service::make_service_fn};
use hyper::{Body, Request};
use tonic::async_trait;
use demo::echo_server::{Echo, EchoServer};
use demo::{EchoReply, EchoRequest};
struct MyEcho;
#[async_trait]
impl Echo for MyEcho {
async fn echo(
&self,
request: tonic::Request<EchoRequest>,
) -> Result<tonic::Response<EchoReply>, tonic::Status> {
Ok(tonic::Response::new(EchoReply {
message: format!("Echoing back: {}", request.get_ref().message),
}))
}
}
#[tokio::main]
async fn main() {
let addr = SocketAddr::from(([0, 0, 0, 0], 3000));
let axum_service = axum::Router::new().route("/", get(|| async { "Hello world!" }));
let grpc_service = tonic::transport::Server::builder()
.add_service(EchoServer::new(MyEcho))
.into_service();
let both_service =
demo_router::Router::new(axum_service, grpc_service, |req: &Request<Body>| {
Ok::<bool, Infallible>(
req.headers().get("content-type").map(|x| x.as_bytes())
== Some(b"application/grpc"),
)
});
let make_service = make_service_fn(move |_conn: &AddrStream| {
let both_service = both_service.clone();
async { Ok::<_, Infallible>(both_service) }
});
let server = hyper::Server::bind(&addr).serve(make_service);
if let Err(e) = server.await {
eprintln!("server error: {}", e);
}
}
Note the Router: it takes the two services and Fn to determine which to use on any given request. Then we just drop that composed service into make_service_fn and we’re done.
Next up we have the Router implementation. This is generic across any two Service<Request<...>> types as long as they are both Into<Bytes> for their Data, and Into<Box<dyn Error>> for errors.
Interesting things here – I use boxed_unsync to abstract over the body concrete type, and I implement the future using async code rather than as a separate struct. It becomes much smaller even after a few bits of extra type constraining.
One thing that flummoxed me for a little was the need to capture the future for the underlying response outside of the async block. Failing to do so provokes a 'static requirement which was tricky to debug. Fortunately there is a bug on making this easier to diagnose in rustc already. The underlying problem is that if you create the async block, and then dereference self, the type for impl of .first has to live an arbitrary time. Whereas by capturing the future immediately, only the impl of the future has to live an arbitrary time, and that doesn’t then require changing the signature of the function.
This is almost worth turning into a crate – I couldn’t see an existing one when I looked, though it does end up rather small – < 100 lines. What do you all think?
The first meeting will start at 1PM Australian Eastern time (Melbourne/Sydney) which is +1100 on Saturday the 5th of February.
I will start the video chat an hour early in case someone makes a timezone mistake and gets there an hour before it starts. If anyone else joins early we will have random chat until the start time (deliberately avoiding topics worthy of the main meeting). The link http://b.coker.com.au will redirect to the meeting URL on the day.
The first scheduled talk is a summary and discussion of free software related news. Anyone who knows of something new that excites them is welcome to speak about it.
The main event is discussion of storage technology and hands-on training on BTRFS and ZFS for those who are interested. Here are the ZFS training notes and here are the BTRFS training notes. Feel free to do the training exercises on your own VM before the meeting if you wish.
Then discussion of the future of the group and the use of FOSS social media. While social media is never going to be compulsory some people will want to use it to communicate and we could run some servers for software that is considered good (lots of server capacity is available).
Finally we have to plan future meetings and decide on which communication methods are desired.
The BBB instance to be used for the video conference is sponsored by NZOSS and Catalyst Cloud.
Since PM Scott Morrison did not announce the federal election date last week, it will now be held somewhere between March and May (see the post from ABC’s Antony Green for details). Various aspects of elections are covered in the Civics & Citizenship Australian Curriculum in Years 4, 5 and 6. Students are interested in […]
The main aim is to provide educational benefits to free software users via an online meeting that can’t be obtained by watching YouTube videos etc in a scope that is larger than one country. When the pandemic ends we will keep running this as there are benefits to be obtained from a meeting of a wide geographic scope that can’t be obtained by meetings in a single city. People from other countries are welcome to attend but they aren’t the focus of the meeting.
Until we get a better DNS name the address http://b.coker.com.au will redirect to the BBB instance used for online meetings (the meeting address isn’t yet setup so it redirects to the blog). The aim is that there will always be a short URL for the meeting so anyone who has one device lose contact can quickly type the URL into their backup device.
The first meeting will be on the 5th of Feb 2022 at 1PM Melbourne time +1100. When we get a proper domain I’ll publish a URL for an iCal file with entries for all meetings. I will also find some suitable way for meeting times to be localised (I’m sure there’s a WordPress plugin for that).
For the hands-on part of the meetings there will be virtual machine images you can download to run on your own system (tested with KVM, should work with other VM systems) and the possibility of logging in to a running VM. The demonstration VMs will have public IPv6 addresses and will also be available through different ports on a single IPv4 address, having IPv6 on your workstation will be convenient for you but you can survive without it.
Linux Australia has a list of LUGs in Australia, is there a similar list for NZ? One thing I’d like to see is a list of links for iCal files for all the meetings and also an iCal aggregator that for all iCal feeds of online meetings. I’ll host it myself if necessary, but it’s probably best to do it via Linux Australia (Linux Australasia?) if possible.
I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.
I found something boneheaded doing that today.
There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.
In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.
To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:
Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.
Neither of those should be happening.
Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.
Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.
OpenHMD Rift CV1 tracking timing
This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!
That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.
This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.
There are 3 threads involved:
USB service thread, reading video frame packets and assembling pixels in framebuffers
Fast analysis thread, that checks tracking lock is still acquired
Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs
These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:
while driver_running:
Pop a frame from the queue
Process the frame
Sleep for new frame notification
The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!
The fix is simple:
while driver_running:
Pop a frame
Process the frame
if queue_is_empty():
sleep for new frame notification
Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.
Pose latency and inter-pose spacing after fix
This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.
What does this mean in practice?
The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.
Before: Left Controller pose delays by sensorAfter: Left Controller pose delays by sensor
Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.
You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.
The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):
Before: Left Controller posesAfter: Left Controller poses
Yes, the blog is still on. January 2004 I moved to WordPress, and it is still here January 2022. I didn’t write much last year (neither here, not experimenting with the Hey blog). I didn’t post anything to Instagram last year either from what I can tell, just a lot of stories.
August 16 2021, I realised I was 1,000 days till May 12 2024, which is when I become 40. As of today, that leads 850 days. Did I squander the last 150 days? I’m back to writing almost daily in the Hobonichi Techo (I think last year and the year before were mostly washouts; I barely scribbled anything offline).
I got a new Apple Watch Series 7 yesterday. I can say I used the Series 4 well (79% battery life), purchased in the UK when I broke my Series 0 in Edinburgh airport.
TripIt stats for last year claimed 95 days on the road. This is of course, a massive joke, but I’m glad I did get to visit London, Lisbon, New York, San Francisco, Los Angeles without issue. I spent a lot of time in Kuantan, a bunch of Langkawi trips, and also, I stayed for many months at the Grand Hyatt Kuala Lumpur during the May lockdowns (I practically stayed there all lockdown).
With 850 days to go till I’m 40, I have plenty I would like to achieve. I think I’ll write a lot more here. And elsewhere. Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.
Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!
In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.
Outlier Detection
Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.
It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.
Plot of headset tracking – orientation and position
The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.
Left Touch controller tracking – orientation and position
This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).
Improved Pose Priors
Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.
This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.
LED visibility model
To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.
Better Smoothing
I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.
Improved Display Distortion Correction
I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.
Persistent Config Cache
Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.
Persistent Room Configuration
To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.
Video Capture + Processing
The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.
Session Simulator
I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.
To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):
The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.
I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.
One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!
As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.
Thank you to the kind people signed up as Github Sponsors for this project!
For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").
It's time we pushed back, and wrote for our readers, not for corporations.
It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".
So from here on in I'll be using the following:
Face Book. Formerly, "Facebook®".
Junos. Formerly JUNOS®.
ram. Formerly RAM.
Pan OS. Formerly PAN-OS®.
Unix. Formerly UNIX®.
I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.
I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:
Simple @property implementation
This shows a demo of @property-style getters. Setters are left as an exercise :)
defdemo_property(f):f.is_a_property=TruereturnfclassHasProperties:def__getattribute__(self,name):ret=super().__getattribute__(name)ifhasattr(ret,"is_a_property"):returnret()else:returnretclassDemo(HasProperties):@demo_propertydefis_a_property(self):return"I'm a property"defis_a_function(self):return"I'm a function"a=Demo()print(a.is_a_function())print(a.is_a_property)
@run (The Scoped Block)
@run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.
A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.
The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:
Adding velocity and acceleration API to OpenHMD.
Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.
Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.
The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.
Ongoing work – record and replay
My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.
Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.
Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.
I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.
To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.
Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.
GKE in Production - Part 2 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. In this episode, we are covering how to setup a nginx ingress controller to handle incoming requests.
Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.
I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.
We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.
The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.
I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.
And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.
Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.
From: Mark Shuttleworth <xxx@xxx> To: Robert Collins <xxx@xxx> Date: Thu, 15 Jan 2004, 04:30
Tom Lord gave me your email address, I believe he’s already sent you the email that I sent him so I’m sure you have some background.
In short, I am going to fund some open source development for a year. This is part of a new project that I will be getting off the ground in the coming weeks. I don’t know where it will lead, it’s flying in the face of a stiff breeze but I think at the end of the day it will at least fund a few very good open source developers for a full year to work on the projects they like most.
One of the pieces of the puzzle is high end source code management. I’ll be looking to build an infrastructure that will manage source code for between 100 and 8000 open source projects (yes, there’s a big difference between the two, I don’t know at which end of the spectrum we will be at the end of the year but our infrastructure will have to at least be capable of scaling to the latter within two years) with upwards of 2000 developers, drawing code from a variety of sources, playing with it and spitting it out regularly in nice packages.
Arch and Subversion seem to be the two leading contenders for “next generation open source sccm”. I’d be interested in your thoughts on the two of them, and how they stack up. I’m looking to hire one person who will lead that part of the effort. They’ll work alone from home, and be responsible for two things. First, extending the tool (arch or svn) in ways that help the project. Such extensions will be released under an open source licence, and hopefully embraced by the tools maintainers and included in the mainline code for the tool. And second, they will be responsible for our large-scale implementation of SCCM, using that tool, and building the management scripts and other infrastructure to support such a large, and hopefully highly automated, set of repositories.
Would you be interested in this position? What attributes and experience do you think would make you a great person to have on the team? What would your salary expectation be, as a monthly figure, for a one year contract full time?
I’m currently on your continent, well, just off it. On Lizard Island, up North. Am headed today for Brisbane, then on the 17th to Launceston via Melbourne. If you happen to be on any of those stops, would you be interested in meeting up to discuss it further?
If you’re curious you can find out a bit more about me at www.markshuttleworth.com. This project is much lower key than some of what you’ll find there. It’s a very long shot indeed. But if at worst all that happens is a bunch of open source work gets funded at my expense I’ll feel it was money well spent.
Cheers, Mark
===== — “Good judgement comes from experience, and often experience comes from bad judgement” – Rita Mae Brown
I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.
Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.
About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.
This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].
Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:
He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.
Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.
Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.
Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.
We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.
As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.
GKE in Production - Part 1 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job.
Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.
What we will build In this first tutorial, we will be building a standard GKE cluster on Google Cloud Platform and deploying the hello world container to confirm everything is working.
Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.
Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.
I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.
You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.
How to use this
First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.
Adding the configuration
In HomeAssistant, add the following line to your configuration.yaml:
sensor:!includepurpleair.yaml
and then add the following contents to purpleair.yaml
-platform:restname:'PurpleAir'# Substitute in the URL of the sensor you care about. To find the URL, go# to purpleair.com/map, find your sensor, click on it, click on "Get This# Widget" then click on "JSON".resource:https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}# Only query once a minute to avoid rate limits:scan_interval:60# Set this sensor to be the AQI value.## Code translated from JavaScript found at:# https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#value_template:>{{ value_json["results"][0]["Label"] }}unit_of_measurement:""# The value of the sensor can't be longer than 255 characters, but the# attributes can. Store away all the data for use by the templates below.json_attributes:-results-platform:templatesensors:purpleair_aqi:unique_id:'purpleair_SENSORID_aqi_pm25'friendly_name:'PurpleAirPM2.5AQI'value_template:>{% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}{{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}{%- endmacro %}{% if (states('sensor.purpleair_pm25')|float) > 1000 %}invalid{% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}{% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}{% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}{{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}{% else %}invalid{% endif %}unit_of_measurement:"bit"purpleair_description:unique_id:'purpleair_SENSORID_description'friendly_name:'PurpleAirAQIDescription'value_template:>{% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}Hazardous{% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}Hazardous{% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}Very Unhealthy{% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}Unhealthy{% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}Unhealthy for Sensitive Groups{% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}Moderate{% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}Good{% else %}undefined{% endif %}entity_id:sensor.purpleairpurpleair_pm25:unique_id:'purpleair_SENSORID_pm25'friendly_name:'PurpleAirPM2.5'value_template:"{{state_attr('sensor.purpleair','results')[0]['PM2_5Value']}}"unit_of_measurement:"μg/m3"entity_id:sensor.purpleairpurpleair_temp:unique_id:'purpleair_SENSORID_temperature'friendly_name:'PurpleAirTemperature'value_template:"{{state_attr('sensor.purpleair','results')[0]['temp_f']}}"unit_of_measurement:"°F"entity_id:sensor.purpleairpurpleair_humidity:unique_id:'purpleair_SENSORID_humidity'friendly_name:'PurpleAirHumidity'value_template:"{{state_attr('sensor.purpleair','results')[0]['humidity']}}"unit_of_measurement:"%"entity_id:sensor.purpleairpurpleair_pressure:unique_id:'purpleair_SENSORID_pressure'friendly_name:'PurpleAirPressure'value_template:"{{state_attr('sensor.purpleair','results')[0]['pressure']}}"unit_of_measurement:"hPa"entity_id:sensor.purpleair
Quirks
I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�
So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.
A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.
I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.
Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.
“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).
It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.
Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.
Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:
Beat Saber – Skunkynator playing Expert+, Mar 16 2021
Strap in. Here’s what I’ve worked on in the last 6 weeks:
Pose Matching improvements
Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.
I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.
Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.
In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.
I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.
LED labelling
When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.
The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.
IMU Gyro scale
One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.
The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.
From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.
The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.
Limited interpolation
If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.
Exponential filtering
I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.
Non-blocking radio
Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.
Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.
As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.
Haptic feedback
The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.
In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.
With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.
I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:
To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.
Sensor fusion efficiency
The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.
At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.
Lens Distortion
To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.
Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.
Camera / USB failures
The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.
Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.
Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.
Occluded cases
Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.
If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.
Upstreaming
April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.
Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.
One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.
Others
I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.
Other Headsets
At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.
Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.
Sponsorship
This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.
Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).
The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.
Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.
Have you heard of Bipolar II?
Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.
Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)
Call to action
If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.
Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.
I will be writing more about the topic in the coming time.
This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.
I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:
It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental
Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.
Meson
I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.
I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.
In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):
Do a brute-force search to match LEDs to 3D models, then (if matched)
Assign labels to each LED blob in the video frame saying what LED they are.
Send an update to the fusion filter about the position / orientation of the device
Then, as each video frame arrives:
Use motion flow between video frames to track the movement of each visible LED
Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.
The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible
What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.
Gravity Vector matching
The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.
The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.
Partial tracking matches
The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.
With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.
While this definitely seems to help, I think the approach can use more work.
Status
With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.
I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.
For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.
Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.
After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.