Planet Linux Australia

,

Michael StillA gotcha with the Walrus operator

In New python syntax I was previously unaware of, I discussed some new operators I’d recently discovered. One of them is called the Walrus operator, which lets you write code like this:

list = ['a', 'b', 'c']
def get_one():
    if not list:
        return None
    return list.pop()

while one := get_one():
   print(one)

See where we do the assignment inside the while? That code returns:

c
b
a

Which is as expected. However, the Walrus operator is strict about needing a None returned to end the iteration. I had code which was more like this:

list = [('a', 1), ('b', 2), ('c', 3)]
def get_one():
    if not list:
        return None, None
    return list.pop()

while one := get_one():
    print(one)

And the while loop never terminates. It just prints (None, None) over and over. So there you go.

,

Silvia PfeifferSWAY at RFWS using Coviu

A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

The post SWAY at RFWS using Coviu first appeared on ginger's thoughts.

Silvia PfeifferSilvia Pfeiffer Live Stream

Silvia PfeifferPARADISEC catalog for Users

This screencast shows how a user of the PARADISEC catalog logs in and explores the collections, items and files that the archive contains.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

The post PARADISEC catalog for Users first appeared on ginger's thoughts.

Silvia PfeifferPARADISEC catalog for Collectors

Screencast of how to use the PARADISEC catalog for managing and publishing collections.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

The post PARADISEC catalog for Collectors first appeared on ginger's thoughts.

Silvia PfeifferPARADISEC catalog for Administrators

Screencast of how a PARADISEC administrator uses the PARADISEC catalog for managing the consistency of metadata and staying on top of uploaded files.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

The post PARADISEC catalog for Administrators first appeared on ginger's thoughts.

,

Simon LyallEverything Open 2024 – Day 3 talks

Keynote: Intelligent Interfaces: Challenges and Opportunities by Aaron Quigley

  • Eye Tracking of the user
    • DiffDisplays – Eye tracking and when you looked away from a screen it frooze it. When you looked back it gave you a summary/diff of what you missed
    • Bought this down to the widget level, a widget got notification when user looking or away and could decide what to do
  • Change Blindness (different from attention blindness)
    • When phone far away simplify phone interface, more detail when closer
    • People don’t see details of displays slowly fading in and out as distance from display changed
  • Phone on table, screen up or screen down
    • SpeCam – Facedown screen can have light and detect what it is sitting on. Guess material it is sitting on
    • Accuracy same/better than a proper spectrometer
  • MicroCam – Phone placed with screen face up
    • Placement aware computing
  • OmniSense
    • 360 Camera
    • Track what the user’s whole body is doing
    • Tracks what is happening all around the user. Danger sensors, context aware output
  • BreathIn control. Breath pattern to control phone
    • User camera in a watch potion to detect handle gestures (looking at top/back of hand)
  • RotoSwype – Smart ring to do gesture keyboard input
  • RadarCat – Radar + Categorization
    • More Socially acceptable that cameras everywhere and always on
    • Used to detect material
    • Complex pattern of reflection and absorption that returns lots of information
    • Trained on 661 feature and 512 bins
    • Radar signal can ever detect different colours. Different dyes interact differently
    • Can detect if people are wearing gloves
    • Application – Scales at self-checkout supermarket to detect what is being weighed
    • Radar in shoe can recognise the surface and layers below (carpet on weed etc)

Passwordless Linux – Passkey and External IdP support in FreeIPA by Fraser Tweedale

  • Passwords
    • Users are diligent (weak reuse)
    • Using passwords securely imposes friction and cognitive load
    • Phishable
  • Objectives – Reduce password picking risks, phishing, friction,frequency of login
  • Alternatives
    • 2FA, Smartcard, Passkeys / WebAuthn, Web SSO Providers
  • 2FA
    • HOTP / TOTP etc
    • phishable
  • Smart Cards
    • Phishing Resistant
  • Passkeys
    • Better versions of MFA Cards
    • Phishing resistant
    • “passkey” term is a little vague
  • Web SSO
    • SAML, OAuth2
    • Using an existing account to authenticate
    • Some privacy concern
    • Keycloak, Redhat SSO, Okta, Facebook
    • Great on the web, harder in other context
  • What about our workstations?
    • pam has hooks for most of the above (Web SSO less common) or pam_sss does all
  • FreeIPA / Red Hat Identity Management
  • DEMO

Locknote: Who gets to work in STEM? And who is being left out? by Rae Johnston

  • Poor diversity affects the development of AI
  • False identification much higher by facial recognition for non-white people
  • Feed the AI more data sets?
  • Bias might not even be noticed if the developers are not diverse
  • Only around 25% of STEM people are Women
  • Only 15% of UK scientist came from Working Class backgrounds (35% of the population)
  • 11% of Australians don’t have access to affordable Internet or don’t use it.
  • The digital divide is narrowing but getting deeper. Increasing harder to function if you are not online
  • Male STEM graduates are 1.8x more likely to be in jobs that required the array than women. Mush worse for indigenous people

Lightning Talks

  • Creating test networks with Network Namespace
    • ip netns add test-lan
  • Rerap Micron
  • Haystack Storage System
    • Time-bases key/value store
  • AgOpenGPS
    • Self Steering System for Tractors
  • Common Network Myths
    • End to end packet loss is the only thing that matters
    • Single broadcast domain is a SPOF, broadcast storms etc
    • Ping and ICMP is your friend. Please allow ping
    • Don’t force 1500 MTU
    • Asymmetric routing is normal
    • non-standard port number doesn’t make you secure
  • radio:console for remote radio
  • WASM
    • FileSender – Share large datasets over the Internet

Share

,

Simon LyallEverything Open 2024 – Day 2 talks

Keynote: How Adversaries Use AI by Jana Dekanovska

  • Adversary
    • Nation States
    • Ecrime
    • Hactivism
  • Trends
    • High Profile Ecrime attacks – Ransomware -> Data extortion
    • Malware-Free Attacks – Phish, Social engineering to get in rather than malware
    • Cloud Consciousness
    • Espionage – Focuses in Eastern Europe and Middle East
    • Vulnerability Exploitation – Not just zero days, Takes while to learn to leverage vuls
    • Cloud Consciousness – Adversary knows they are in the cloud, have to operate in it.
  • Generative AI
    • Code Generation
    • Social Engineer – Help people sound like Native Speakers, improve wording
    • Prompt Injection
  • Big Four States sponsoring attacks – China, North Korea, Iran, Russia
  • North Korea – Often after money
  • Russia, Iran – Concentrating on local adversaries
  • China
    • 1m personal in Cyber Security
    • Get as much data as possible
  • Elections
    • Won’t be hacking into voting systems
    • Will be generating news, stories, content and targeting populations
  • Crime Operations
    • GenAI helps efficiency and Speed of attacks
    • Average Breakout time faster from 10h in 2018 to 1h now
    • Members from around the world, at leats one from Australia
    • Using ChatGPT to help out during intrusions to understand what they are seeing
    • Using ChatGPT to generate scripts

Consistent Eventually Replication Database by William Brown

  • Sites go down. Lets have multiple sites for our database
  • CAP Theorem
  • PostgresSQL Database
    • Active Primary + Standby
    • Always Consistent
    • Promote passive to active in event of outage
    • Availability
    • But not partition tolerant
  • etcd
    • Nodes elect active node which handles writes. Passive nodes go offline then others are still happy
    • If active node fails then new active node elected and handles writes
    • Not availbale. Since if only one node then it will go to sleep cause it doesn’t know state of other nodes (dead or just unreachable)
  • Active Directory
    • If node disconnected then it will just keep serving old data
    • reads and writes always services even if they are out of contact with other nodes
    • Not consistent
  • Kanidm
    • identity management database
    • Want availability and partition tolerance
    • Because we want disconnected nodes to still handle reads and writes (eg for branch office that is off internet)
    • Also want to be able to scale very high, single node can’t handle all the writes
  • Building and Design
    • Simultaneous writes have to happen on multiple servers, what happens if writes overlap. Changes to same record on different servers
    • ” What would Postgres do? “
    • Have nanosecond timestamps. Apply events nicely in order, only worry about conflicts. Use Lamport Clock (which only goes forward)
    • What happens if the timestamps match?
    • Servers get a uuid, timestamp gets uuid added to it so one server is slightly newer
    • Both servers can go though process in isolation and get the same outputted database content
  • Lots more stuff but I got lost
    • Attribute State + CRDT
  • Most of your code will be doing weird paths. And they must all be tested.
  • Complaint that academic papers are very hard to read. Difficult to translate into code.

Next Generation Authorisation – a developers guide to Cedar by Ricardo Sueiras

  • Authorisation is hard
  • Ceder
    • DSL around authorisation
    • Policy Language
    • Evaluation and Authorisation Engine
    • Easy to Analise
  • Authorisation Language

Managing the Madness of Cloud Logging by Alistair Chapman

  • The use case
  • All vendors put their logs in weird places and in weird sorts of ways. All differently
  • Different defaults for different events
  • Inconsistent event formats –
  • Changes must be proactive – You have to turn on before you need it
  • Configuration isn’t static – VEndor can change around the format with little worning
  • Very easy to access the platform APIs from a VM.
  • Easy to get on a VM if you have access to the Cloud platform
  • Platform Security Tools
    • Has access to all logs and can correlate events
    • Doesn’t work well if you are not 100% using their product. ie Multi-cloud
    • Can cost a lot, requires agents to be deployed
  • Integrating with your own SIEM platform
    • Hard to push logs out to external sources sometimes
    • Can get all 3 into splunk, loki, elastic
    • You have to duplicate with the cloud provider has already done
  • Assess your requirements
    • How much do you need live correlation vs reviewing after something happened
    • Need to plan ahead
    • OSCF, OTel, ECS – Standards. Pick one and use for everything
    • Try log everything. Audit events, Performance metrics, Billing
    • But obvious lots of logs cost logs of money
    • Make it actionable – Discoverability and correlation. Automation
  • Taming log Chaos
    • Learn from Incidents – What sort of thing happens, what did you need availbale
    • Test assumptions – eg How trusted is “internal”
    • Log your logging – How would you know it is not working
    • Document everything – Make it easier to detect deviations from norm
    • Have processes/standards for the teams generating the events (eg what tags to use)
  • Prioritise common mistakes
    • Opportunity for learning
    • Don’t forget to train the humans
  • Think Holistically
    • App security is more than just code
    • Automation and tooling will help but not solve anything
    • If you don’t have a security plan… Make one
  • Common problems
    • Devs will often post key to github
    • github has a feature to block common keys, must be enabled
  • Summary
    • The logs you gather must be actionable
    • Get familiar with the logs, and verify they actually work they way you think
    • Put the logs in one place if you can
    • Plan for the worst
    • Don’t let the logs overwhelm you. But don’t leave important events unlogged
    • The fewer platforms you use the easier it is

Share

,

Simon LyallEverything Open 2024 – Day 1 talks

Developing in the open, building a product with our users by Toby Bellwood

  • The Lagoon Story
    • At amazee.io . Is Lagoon Lead
    • What is Lagoon
    • Application to Kubernetes (docker build for customer, converts to k8s)
    • Docker based
    • Based on git workflows. Mostly Drupal, WordPress, PHP and NodeJS apps
    • Presets for the extra stuff like monitoring etc
  • Why
    • Cause Developers are too busy to do all that extra stuff
    • and it means Ops prefer if it was all automated away (the right way)
  • 8 full-time team members
    • Knows a lot about application, not so much about the users (apart from Amazee.io)
    • Users: Hosting providers, Agencies, Developers
    • The Adopter: Someone using it for something else, weird use cases
    • Agencies: Need things to go out quickly, want automation, like documentation to be good. Often will need weird technologies cause customers wants that.
    • Developers: Just want it stabele. Only worried about one project at at time. Often OS minded
  • User Mindset
    • Building own tools using application
    • Do walking tours of the system, recorded zoom session
    • Use developer tools
    • Discord, Slack, Office Hours, Events, Easy Access to the team
  • Balance priorities
    • eg stuff customers will use even those Amazee won’t use
  • Engaging Upstream
    • Try to be a good participant, What they would want their customers to be
    • Encourage our teams to “contribute first”. Usually works well
  • Empowering the Team
    • Contribute under your own name
    • Participate in communities
  • How to stay Open Source forever?
    • Widening the Core Contributor Group
    • Learn from others in the Community. But most companies are not open sourcing the main component of their business.
    • Unsuccessful CNCF Sandbox project

Presenting n3n – A simple Peer to Peer VPN by Hamish Coleman

  • How to compares to other VPNs?
    • Peer to peer
    • NAT piecing
    • Not all packets need to go via the server
    • Distributed ethernet switch – gives extra features
    • Userspace except for tuntap driver which is pretty common
    • Low deployment requirements, easy to install in multiple environments
    • Relatively simple security, not super secure
  • History
    • Based off n2n (developed by the people who did ntop)
    • But they changed the license in October 2023
    • Decided to fork into a new project
    • First release of n3n in April 2024
  • Big change was they introduced a CLA (contributor licensing agreement)
  • CLAs have problems
    • Legal document
    • Needs real day, contributor hostile, asymmetry of power
    • Can lead to surprise relicencing
  • Alternatives to a CLA
  • Preserving Git history
    • Developer’s Certificate of Origin
    • Or it could be a CLA
  • Handling Changes
    • Don’t surprise your Volunteers
    • Don’t ignore your Volunteers
    • Do discuss with you Volunteers and bring them along
  • Alternatives
    • Wireguard – No NAT piercing
    • OpenVPN – Mostly client to Server. Also Too configurable
  • Why prefer
    • One simple access method (Speaker uses 4x OS)
    • A single access method
    • p2p avoid latency delays because local instances to talk directly
  • Goals
    • Protocol compatibility with n2n
    • Don’t break user visible APIs
    • Incrementally clean and improve codebase
  • How it works now
    • Supernode – Central co-ordination point, public IP, Some access control, Last-resort for packet forwarding
    • Communities – Nodes join, form a virtual segment
  • IP addresses
    • Can just run a DHCP server inside the network
  • Design
    • Tries to create a full mesh of nodes
    • Multiple Supernodes for metadata
  • Added a few features from n2n
    • INI file, Help text, Tidied up the CLI options and reduced options
    • Tried to make the defaults work better
  • Built in web server
    • Status page, jsonRPC, Socket interfaces, Monitoring/Stats
  • Current State of fork
    • Still young. Another contributor
    • Only soft announced. Growing base of awareness
  • Plans
    • IPv6
    • Optimise encryption/compression
    • Improve packaging and submit to distros
    • Test coverage
    • Better NAT piercing
    • Continue improve config experience
    • Selectable tuntap drivers
    • Mobile phone support hoped for but probably some distance away
  • Speaker’s uses for software
    • Manage mothers computer
    • Management interface for various servers around the world
    • LAN Gaming using Windows 98 machines
    • Connect back to home network to avoid region blockinghttps://github.com/n42n/n3n
  • https://github.com/n42n/n3n

From the stone age to silicon: The Dwarf Axe guide to the evolution of technology by Steven Ellis

  • What is a “Dwarf Axe” ?
    • Snowflakes vs Dwarf Axes
    • It’s an Axe that handled down and consistently delivers a service
    • Both the head ( software ) and the handle ( hardware ) are maintained and upgraded separately and must be maintained. Treated like the same platform even though it is quite different from what it was originally. Delivers the same services though
  • Keeps a fairly similar services. Same box on a organisation diagram
  • Home IT
    • Phones handed down to family members. Often not getting security patches anymore
  • Enterprise IT
    • Systems kept long past their expected lifetime
    • Maintained via virtualisation
  • What is wrong with a Big Axe?
    • Too Big to Fail
    • Billion dollar projects fail.
  • Alternatives
    • Virtual Machines – Running on Axe somewhere,
    • Containers – Something big to orchestrate the containers
    • Microservices – Also needs orchestration
  • Redesign the Axe
    • The cloud – It’s just someone else Axe
  • Options
    • Everything as a service. 3rd party services
  • Re-use has an end-of-life
    • Modern hardware should have better )and longer) hardware support
  • Ephemeral Abstraction
    • Run anywhere
    • Scale out not up
    • Avoid single points of failure
    • Focus on the service (not the infra or the platform)
    • Use Open tools and approaches
  • Define your SOE
    • Not just your OS

Share

yifeiEverything Open 2024 Quick Notes :: Day 1

sched_ext - Write your own Linux thread scheduler in BPF #

  • BPF made creating new scheduler simpler

    • with strong safety guarantee to not break the system, the side effects of bad scheduler are confined.
    • run a binary to enable your scheduler, stop the binary to revert to default
  • Scheduling problem is now more complicated due to increasing complexity of workload/CPU design

  • BPF provides reliable access to critical data structures inside the kernel

Exploring mobile linux security with PinePhone Pro: OP-TEE sec enclave, Virtualization and beyond #

  • This is my talk ;)
  • See the readings page for slides/demos and more.

Presenting n3n - A simple Peer to Peer VPN #

  • Forked from n2n to avoid CLA

    • Protocol level compatibility with n2n is maintained
  • Peer-to-peer VPN at network layer, acting like a distributed virtual switch

    • Layer 2 over Layer 3
    • Only route packets through a server/supernode when required, p2p by default
    • Better latency due to being p2p
  • NAT piecing

  • Written in C, should have good cross-platform supports (more testing wanted on *BSD)

    • Relatively small codebase for a VPN
  • TunTap interface support is expected from the OS side, shouldn’t be a problem for common Unix-likes

    • Modern macOS is dropping support for TunTap, need to use NetworkExtension?
  • Packaging and distro submission are still WIP

    • Framework for a debian package exists but not in an upstreamable shape
    • OpenBSD?
  • Future roadmap

    • n3n over IPv6
    • Code cleanup
    • Multiple network driver support (e.g. something other than TunTap)
    • Better NAR piecing
    • Mobile support?
  • Useful for

    • LAN gaming with old/modern systems
    • Remote access
  • Simpler than wireguard/openvpn but offers OK security (not for security-critical apps?)

  • Easier to configure, use INI style config files

Running your own Mailserver #

  • 90% of all incoming mails are low-effort spams.
  • Setup DMARC/SPF records

Lions OS #

  • seL4 is bad at usability, Lions OS intends to solve this

  • Still in early stage of development

  • Composable components for build custom OS for a single task

    • Runs on seL4 Microkernel
    • For things like IoT, embedded, cars etc…
  • Focus on simplicity

  • 0.1.0 just released, still in its early stage

  • high performance

  • Only for Arm64/aarch64 now, riscv64 in future?

  • Device Driver Model

  • Multi Language Support

  • A reference system called Kitty exists

    • A Linux running inside VMM is used for framebuffer, but any OS should do

,

Russell CokerSoftware Needed for Work

When I first started studying computer science setting up a programming project was easy, write source code files and a Makefile and that was it. IRC was the only IM system and email was the only other communications system that was used much. Writing Makefiles is difficult but products like the Borland Turbo series of IDEs did all that for you so you could just start typing code and press a function key to compile and run (F5 from memory).

Over the years the requirements and expectations of computer use have grown significantly. The typical office worker is now doing many more things with computers than serious programmers used to do. Running an IM system, an online document editing system, and a series of web apps is standard for companies nowadays. Developers have to do all that in addition to tools for version control, continuous integration, bug reporting, and feature tracking. The development process is also more complex with extra steps for reproducible builds, automated tests, and code coverage metrics for the tests. I wonder how many programmers who started in the 90s would have done something else if faced with Github as their introduction.

How much of this is good? Having the ability to send instant messages all around the world is great. Having dozens of different ways of doing so is awful. When a company uses multiple IM systems such as MS-Teams and Slack and forces some of it’s employees to use them both it’s getting ridiculous. Having different friend groups on different IM systems is anti-social networking. In the EU the Digital Markets Act [1] forces some degree of interoperability between different IM systems and as it’s impossible to know who’s actually in the EU that will end up being world-wide.

In corporations document management often involves multiple ways of storing things, you have Google Docs, MS Office online, hosted Wikis like Confluence, and more. Large companies tend to use several such systems which means that people need to learn multiple systems to be able to work and they also need to know which systems are used by the various groups that they communicate with. Microsoft deserves some sort of award for the range of ways they have for managing documents, Sharepoint, OneDrive, Office Online, attachments to Teams rooms, and probably lots more.

During WW2 the predecessor to the CIA produced an excellent manual for simple sabotage [2]. If something like that was written today the section General Interference with Organisations and Production would surely have something about using as many incompatible programs and web sites as possible in the work flow. The proliferation of software required for work is a form of denial of service attack against corporations.

The efficiency of companies doesn’t really bother me. It sucks that companies are creating a demoralising workplace that is unpleasant for workers. But the upside is that the biggest companies are the ones doing the worst things and are also the most afflicted by these problems. It’s almost like the Bureau of Sabotage in some of Frank Herbert’s fiction [3].

The thing that concerns me is the effect of multiple standards on free software development. We have IRC the most traditional IM support system which is getting replaced by Matrix but we also have some projects using Telegram, and Jabber hasn’t gone away. I’m sure there are others too. There are also multiple options for version control (although github seems to dominate the market), forums, bug trackers, etc. Reporting bugs or getting support in free software often requires interacting with several of them. Developing free software usually involves dealing with the bug tracking and documentation systems of the distribution you use as well as the upstream developers of the software. If the problem you have is related to compatibility between two different pieces of free software then you can end up dealing with even more bug tracking systems.

There are real benefits to some of the newer programs to track bugs, write documentation, etc. There is also going to be a cost in changing which gives an incentive for the older projects to keep using what has worked well enough for them in the past,

How can we improve things? Use only the latest tools? Prioritise ease of use? Aim more for the entry level contributors?

,

Michael StillDebugging

One of the other architects at work was running a reading group for our North American comrades, and I felt left out so I figured I may as well just pick up the book to see what the deal was. This book is a bit old, and was written at the time to try and be funny, but to be honest I don’t think the humour has aged well and it makes the book jarring to read. Overall I’d describe the book as having been written in the style of a long form chatty blog post, which is a bit unusual.

90% of the readers of this book will be looking for advice on how to debug software systems, but the book frequently uses hardwaare systems as examples. That speaks to the author’s background, but its not super helpful for modern audiences living in a software defined world.

The book is also a bit dated in terms of terminology and expectations of the work environment — for example, the discussion of repeatable testing doesn’t mention automated regression testing at all, and the only mention of automated tests is fleeting at best. Another example is that the author recommends that you read the manual for whatever system you are testing from cover to cover before starting. Really? When was the last time you saw a real world system with a manual? Certainly for modern software systems this is basically unheard of. There might be a user guide, but engineering documentation is a step we seem to always skip these days.

The awkward bit is the nine rules are actually pretty good advice and have stood the test of time. They are:

  • Understand the system
  • Make it fail (stimulate the failure, but be careful not to simulate it)
  • Quit thinking and look (don’t guess at possible causes, go and seek evidence via instrumentation and direct measurement)
  • Divide and conquer
  • Change one thing at a time
  • Keep an audit trail
  • Check the plug
  • Get a fresh view
  • If you didn’t fix it, it ain’t fixed

I think in summary this book would have been a good blog post, but hasn’t aged well as an actual book. Sorry Dave, better luck next time.

Debugging Book Cover Debugging
David J. Agans
Business & Economics
Amacom Books
September 30, 2006
183

Written in a frank but engaging style, this guide provides simple, foolproof principles guaranteed to help find any hardware or software bug quickly. It is applicable for any system in any circumstance.

yifeiLinks and Further Readings for My Everything Open 2024 Talk

Here you can find a list of links related to my topic which I find useful or just interesting.

Meta #

Info page https://2024.everythingopen.au/schedule/presentation/24/

Slides EO2024.Slides.exploring.mobile.linux.security.odp

Recording XXX to be processed

VerityMobile GitHub :: ZhanYF/veritymobile

Demo #

Access Measurements from Linux Userland

Sign in to GitLab with fTPM-backed FIDO token

fTPM-backed SSH Identity

Disposable Web Session

OP-TEE #

Docs Index and high level introduction #

https://optee.readthedocs.io/en/latest/general/about.html

GlobalPlatform API #

https://optee.readthedocs.io/en/latest/architecture/globalplatform_api.html#globalplatform-api

Talks and Demos about OP-TEE #

https://optee.readthedocs.io/en/latest/general/presentations.html

Other TEEs #

Android Trusty #

https://source.android.com/docs/security/features/trusty

Apple Secure Enclave #

https://support.apple.com/en-sg/guide/security/sec59b0b31ff/web

TPM and Desktop/Mobile Linux #

What Can You Do with a TPM by Michael Peters #

This also covers Measured Boot and Secure Boot

https://next.redhat.com/2021/05/13/what-can-you-do-with-a-tpm/

A WebAuthn/U2F token protected by a TPM (Go/Linux) by Peter Sanford #

https://github.com/psanford/tpm-fido

Setup TPM-backed SSH identity #

https://www.ledger.com/blog/ssh-with-tpm

Secure Boot on embedded devices #

Secure boot in embedded Linux systems by Thomas Perrot #

https://bootlin.com/pub/conferences/2021/lee/perrot-secure-boot/perrot-secure-boot.pdf

Shadow-box #

Shadow-box for ARM using OP-TEE #

Highlevel description #

https://www.blackhat.com/asia-18/briefings.html#shadow-box-v2-the-practical-and-omnipotent-sandbox-for-arm

Source code and build instructions #

https://github.com/kkamagui/shadow-box-for-arm https://github.com/kkamagui/manifest

Older version of Shadow-box for x86 #

https://github.com/kkamagui/shadow-box-for-x86

RK3399 #

Enabling Secure Boot on RockChip SoCs by Artur Kowalski #

https://blog.3mdeb.com/2021/2021-12-03-rockchip-secure-boot/

Virtualization #

Firecracker #

https://github.com/firecracker-microvm/firecracker

firectl(1) #

https://github.com/firecracker-microvm/firectl

Run general purpose arm64 VMs with KVM on RK3399 #

https://segments.zhan.science/posts/kvm_on_pinehone_pro/

,

Russell CokerML Training License

Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.

I respect such technical work to enforce one’s legal rights when they aren’t respected by corporations, but I have a different approach.

As an aside the Fosdem lecture “Fortify AI against regulation, litigation and lobotomies” is interesting on this topic [1], it’s what inspired me to write about this.

For what I write I am at this time happy to allow it to be used as part of a large training data set (consider this blog post a licence grant that applies until such time as I edit this post to change it). But only if aggregated with so much other data that my content is only a tiny portion of the data set by any metric. So I don’t want someone to make a programming LLM that has my code as the only C code or a political data set that has my blog posts as the only left-wing content. If someone wants to train an LLM on only my content to make a Russell-simulator then I don’t license my work for that purpose but also as it’s small enough that anyone with a bit of skill could do it on a weekend I can’t stop it. I would be really interested in seeing the results if someone from the FOSS community wanted to make a Russell-simulator and would probably issue them a license for such work if asked.

If my work comprises more than 0.1% of the content in a particular measure (theme, programming language, political position, etc) in a training data set then I don’t permit that without prior discussion.

Finally if someone wants to make a FOSS training data set to be used for FOSS LLM systems (maybe under the AGPL or some similar license) then I’ll allow my writing to be used as part of that.

,

Matt PalmerHow I Tripped Over the Debian Weak Keys Vulnerability

Those of you who haven’t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys.

The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I’d share it as a tale of how “huh, that’s weird” can be a powerful threat-hunting tool – but only if you’ve got the time to keep pulling at the thread.

Prelude to an Adventure

Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY’s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet’s eye.

I am, of course, talking about everyone’s favourite Microsoft product: GitHub.

Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we’re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow.

The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?

2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file

Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done.

EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint.

We didn’t take this decision on a whim – it wasn’t a case of “yeah, sure, let’s just hack around with OpenSSH, what could possibly go wrong?”. We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn’t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn’t have to worry about this problem for a long time to come.

Normally, you’d think “patching OpenSSH to make mass SSH logins super fast” would be a good story on its own. But no, this is just the opening scene.

Chekov’s Gun Makes its Appearance

Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users’ repos over SSH. Naturally, as we’d recently rolled out the OpenSSH patch, which touched this very thing, the code I’d written was suspect number one, so I was called in to help.

The lineup scene from the movie The Usual Suspects
They're called The Usual Suspects for a reason, but sometimes, it really is Keyser Söze

Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn’t happen – it’s a bit like winning the lottery twice in a row1 – unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on.

The users professed no knowledge of each other, neither admitted to publicising their key, and couldn’t offer any explanation as to how the other person could possibly have gotten their key.

Then things went from “weird” to “what the…?”. Because another pair of users showed up, sharing a key fingerprint – but it was a different shared key fingerprint. The odds now have gone from “winning the lottery multiple times in a row” to as close to “this literally cannot happen” as makes no difference.

Milhouse from The Simpsons says that We're Through The Looking Glass Here, People

Once we were really, really confident that the OpenSSH patch wasn’t the cause of the problem, my involvement in the problem basically ended. I wasn’t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn’t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys.

However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated.

That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov’s Gun Goes Off

With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL’s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from “bazillions” to a little over 32,000. With so many people signing up to GitHub – some of them no doubt following best practice and freshly generating a separate key – it’s unsurprising that some collisions occurred.

You can imagine the sense of “oooooooh, so that’s what’s going on!” that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn’t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned

While I’ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed – likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go “hmm, I wonder what’s going on here?”, then keep digging until the solution presented itself.

The thought “hmm, that’s odd”, followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though.

When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn’t do the deep investigation. I wonder if Luciano hadn’t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy – myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service.

As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference.

It’s a luxury to be able to take the time to really dig into a problem, and it’s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go “hmm…”.

Support My Hunches

If you’d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.


  1. the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it’s easiest to just approximate it as “really, really unlikely”. 

,

Simon LyallAudiobooks – March 2024

Surely You Can’t Be Serious: The True Story of Airplane! by David Zucker, Jim Abrahams, Jerry Zucker

Covers the career of the makers (ZAZ) and the long path to writing, pitching, pre-production and making of the classic movie. As well as reactions to it. 4/5

All these worlds by Dennis E Taylor

3rd book in Bobverse trilogy . Worth it if you liked the others. The Bobs deal with deaths of their human friends and most plots are wrapped up. 3/5

The Two-Penny Bar by Georges Simenon

Following a cold case Maigret gatecrashes the weekend gathering of a group of friends when one is unexpectedly murdered. Felt a little unrealistic at times. 3/5

The Car: The Rise and Fall of the Machine That Made the Modern World by Bryan Appleyard

A general history of the car with a bit of speculation about it’s future. A smooth and interesting read. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

yifeiBootstrap OpenBSD/riscv64 with Qemu on Linux

OpenBSD supported riscv64 since the 7.0 release, but officially supported hardwares are still not easy to obtain. To have some fun with OpenBSD/riscv64 without access to such hardware, and to potentially bring up other hardware platforms by extending kernel supports, we can run a Qemu virtual machine on a common x86 system. What’s documented here is done on a X86_64 Debian host, but any system with sufficiently new Qemu available should work.

Obtaining Installation Media #

Prepare VM environment #

Install OpenBSD/riscv64 #

Pull Source Tree and Compile Kernel #

,

Russell CokerLinks March 2024

Bruce Schneier wrote an interesting blog post about his workshop on reimagining democracy and the unusual way he structured it [1]. It would be fun to have a security conference run like that!

Matthias write an informative blog post about Wayland “Wayland really breaks things… Just for now” which links to a blog debate about the utility of Wayland [2]. Wayland seems pretty good to me.

Cory Doctorow wrote an insightful article about the AI bubble comparing it to previous bubbles [3].

Charles Stross wrote an insightful analysis of the implications if the UK brought back military conscription [4]. Looks like the era of large armies is over.

Charles Stross wrote an informative blog post about the Worldcon in China, covering issues of vote rigging for location, government censorship vs awards, and business opportunities [5].

The Paris Review has an interesting article about speaking to the CIA’s Creative Writing Group [6]. It doesn’t explain why they have a creative writing group that has some sort of semi-official sanction.

LongNow has an insightful article about the threats to biodiversity in food crops and the threat that poses to humans [7].

Bruce Schneier and Albert Fox Cahn wrote an interesting article about the impacts of chatbots on human discourse [8]. If it makes people speak more precisely then that would be great for all Autistic people!

,

Michael StillNew python syntax I was previously unaware of

This post documents the new syntax features I learned about while reading cpython internals.

You can create more than one context manager on a single line.

So for example Shaken Fist contains code like this:

with open(path + '.new', 'w') as o:
    with open(path, 'r') as i:
        ...

That can now be written like this:

with open(path + '.new', 'w') as o, open(path, 'r') as i:
    ...

 

You can assign values in a while statement, but only one.

Instead of this:

d = f.read(8000)
while f:
...
d = f.read(8000)

You can write this:

while d := f.read(8000):
    ...

But unfortunately this doesn’t work:

while a, b := thing():
    ...

 

You can use underscores as commands in long numbers to make them easier to read.

For example, you can write 1000000 or 1_000_000 and they both mean the same thing.

 

You can refer to positional arguments by name, but you can also disable that.

I didn’t realise that this was valid python:

def foo(bar=None):
    print(bar)

foo(bar='banana')

You can turn it off with a forward slash in the argument list though, which should separate positional arguments from named arguments:

def foo(bar, /, extra=None):
    print(bar)
    print(extra)

foo('banana', extra='frog')

The above example will work, whereas this wont:

>>> foo(bar='banana', extra='frog')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() got some positional-only arguments passed as keyword arguments: 'bar'

 

yield from lets you “delegate” yielding values in an interator

This one is interesting. So if you find yourself writing code like this:

def get_things():
    while thing in find_one_category():
        yield thing

    while thing in find_another_category():
        yield thing

Then you can instead write that as:

def get_things():
    yield from find_one_category()
    yield from find_another_category()

Which is much nicer.

Michael Stillcpython internals

I have been paid money to write Python code since about 2006, so I figured it was probably time that I should understand some of the inner workings of Python. I therefore picked up two books on the topic, this one being the first of the two.

This book to be honest isn’t completely what I expected. Its very well written and quite interesting, but its more about the things you’d need to know to become a Python core developer, rather than the things you should know as a user of Python like how the Python dictionary implementation is built.

(If you want that specifically, this video is an excellent introduction).

The book starts with the assumption that you are going to want to customize Python and recompile it (which while educational is probably a truly terrible idea in the real world), so the first couple of chapters are devoted to setting up a development and compilation environment for Python on various platforms, which seems fair enough to me. The book then moves into a discussion of how the language grammar is defined and parsed, which is something I haven’t really thought about since my compilers course at university.

Interestingly, within the first couple of chapters of the book I had learnt about Python syntax constructs I was unaware of as someone who struggles a bit to keep you with the development of the language over time. So even that aspect has been quite useful. I’ll document some examples of new syntax I learnt in another blog post so as to keep this post about the book itself.

The book then moves on to describing at a high level how the Python bytecode runtime works. It then describes memory allocation. Next is multiprocessing including the exact mechanism that Python uses to fork a new process or thread — punch line, its more complicated than it would look if your mental model is how fork works in C. These topics are all dealt with in detail, which is likely helpful to total computer science newbies, but is sometimes a bit excruciating if you already know what a semaphore is.

The coverage of misuses of yield, coroutines, and async is quite good I think. Sub interpreters sound interesting too, although too experimental for use quite yet. There is extensive coverage of the various Python data types, including how the Python number type is slightly bonkers and how unicode works under the hood. There is in fact even some coverage of how dictionaries work, but the video above is better to be honest.

Overall I enjoyed this book and I definitely learned things. I wouldn’t recommend it to a Python newbie, but if you’re an experienced developer and want to understand of what life is like under the hood then this is a good place to start.

CPython Internals Book Cover CPython Internals
Anthony Shaw
May 5, 2021
396

Get your guided tour through the Python 3.9 interpreter: Unlock the inner workings of the Python language, compile the Python interpreter from source code, and participate in the development of CPython. Are there certain parts of Python that just seem like magic? This book explains the concepts, ideas, and technicalities of the Python interpreter in an approachable and hands-on fashion. Once you see how Python works at the interpreter level, you can optimize your applications and fully leverage the power of Python.

,

sthbrx - a POWER technical blogLifecycle of a kernel task

Introduction

CPU cores are limited in number. Right now my computer tells me it's running around 500 processes, and I definitely do not have that many cores. The operating system's ability to virtualise work as independent 'executable units' and distribute them across the limited CPU pool is one of the foundations of modern computing.

The Linux kernel calls these virtual execution units tasks1. Each task encapsulates all the information the kernel needs to swap it in and out of running on a CPU core. This includes register state, memory mappings, open files, and any other resource that needs to be tied to a particular task. Nearly every work item in the kernel, including kernel background jobs and userspace processes, is handled by this unified task concept. The kernel uses a scheduler to determine when and where to run tasks according to some parameters, such as maximising throughput, minimising latency, or whatever other characteristics the user desires.

In this article, we'll dive into the lifecycle of a task in the kernel. This is a PowerPC blog, so any architecture specific (often shortened to 'arch') references are referring to PowerPC. To make the most out of this you should also have a copy of the kernel source open alongside you, to get a sense of what else is happening in the locations we discuss below. This article hyper-focuses on specific details of setting up tasks, leaving out a lot of possibly related content. Call stacks are provided to help orient yourself in many cases.

Booting

The kernel starts up with no concept of tasks, it just runs from the location the bootloader started it (the __start function for PowerPC). The first idea of a task takes root in early_setup() where we initialise the PACA (I asked, but what this stands for is unclear). The PACA is used to hold a lot of core per-cpu information, such as the CPU index (for generic per-cpu variables) and a pointer to the active task.

__start()  // ASM implementation, defined in head_64.S
  __start_initialization_multiplatform()
    __after_prom_start()
      start_here_multiplatform()
        early_setup()   // switched to C here, defined in setup_64.c
          initialise_paca()
            new_paca->__current = &init_task;

We use the PACA to (among other things) hold a reference to the active task. The task we start with is the special init_task. To avoid ambiguity with the userspace init task we see later, I'll refer to init_task as the boot task from here onwards. This boot task is a statically defined instance of a task_struct that is the root of all future tasks. Its resources are likewise statically defined, typically named following the pattern init_*. We aren't taking advantage of the context switching capability of tasks this early in boot, we just need to look like we're a task for any initialisation code that cares. For now we continue to work as a single CPU core with a single task.

We continue on and reach start_kernel(), the generic entry point of the kernel once any arch specific bootstrapping is sufficiently complete. One of the first things we call here is setup_arch(), which continues any initialisation that still needs to occur. This is where we call smp_setup_pacas() to allocate a PACA for each CPU; these all get the boot task as well (all referencing the same init_task structure, not copies of it). Eventually they will be given their own independent tasks, but during most of boot we don't do anything on them so it doesn't matter for now.

The next point of interest back in start_kernel() is fork_init(). Here we create a task_struct allocator to serve any task creation requests. We also limit the number of tasks here, dynamically picking the limit based on the available memory, page size, and a fixed upper bound.

void __init fork_init(void) {
    // ...
    /* create a slab on which task_structs can be allocated */
    task_struct_whitelist(&useroffset, &usersize);
    task_struct_cachep = kmem_cache_create_usercopy("task_struct",
            arch_task_struct_size, align,
            SLAB_PANIC|SLAB_ACCOUNT,
            useroffset, usersize, NULL);
    // ...
}

At the end of start_kernel() we reach rest_init() (as in 'do the rest of the init'). In here we create our first two dynamically allocated tasks: the init task (not to be confused with init_task, which we are calling the boot task), and the kthreadd task (with the double 'd'). The init task is (eventually) the userspace init process. We create it first to get the PID value 1, which is relied on by a number of things in the kernel and in userspace2. The kthreadd task provides an asynchronous creation mechanism for kthreads: callers append their thread parameters to a dedicated list, and the kthreadd task spawns any entries on the list whenever it gets scheduled. Creating these tasks automatically puts them on the scheduler run queue, and they might even start automatically with preemption.

// init/main.c

noinline void __ref __noreturn rest_init(void)
{
    struct task_struct *tsk;
    int pid;

    rcu_scheduler_starting();
    /*
     * We need to spawn init first so that it obtains pid 1, however
     * the init task will end up wanting to create kthreads, which, if
     * we schedule it before we create kthreadd, will OOPS.
     */
    pid = user_mode_thread(kernel_init, NULL, CLONE_FS);
    /*
     * Pin init on the boot CPU. Task migration is not properly working
     * until sched_init_smp() has been run. It will set the allowed
     * CPUs for init to the non isolated CPUs.
     */
    rcu_read_lock();
    tsk = find_task_by_pid_ns(pid, &init_pid_ns);
    tsk->flags |= PF_NO_SETAFFINITY;
    set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
    rcu_read_unlock();

    numa_default_policy();
    pid = kernel_thread(kthreadd, NULL, NULL, CLONE_FS | CLONE_FILES);
    rcu_read_lock();
    kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
    rcu_read_unlock();

    /*
     * Enable might_sleep() and smp_processor_id() checks.
     * They cannot be enabled earlier because with CONFIG_PREEMPTION=y
     * kernel_thread() would trigger might_sleep() splats. With
     * CONFIG_PREEMPT_VOLUNTARY=y the init task might have scheduled
     * already, but it's stuck on the kthreadd_done completion.
     */
    system_state = SYSTEM_SCHEDULING;

    complete(&kthreadd_done);

    /*
     * The boot idle thread must execute schedule()
     * at least once to get things moving:
     */
    schedule_preempt_disabled();
    /* Call into cpu_idle with preempt disabled */
    cpu_startup_entry(CPUHP_ONLINE);
}

After this, the boot task calls cpu_startup_entry(), which transforms it into the idle task for the boot CPU and enters the idle loop. We're now almost fully task driven, and our journey picks back up inside of the init task.

Bonus tip: when looking at the kernel boot console, you can tell what print actions are performed by the boot task vs the init task. The init_task has PID 0, so lines start with T0. The init task has PID 1, so appears as T1.

[    0.039772][    T0] printk: legacy console [hvc0] enabled
...
[   28.272167][    T1] Run /init as init process

The init task

When we created the init task, we set the entry point to be the kernel_init() function. Execution simply begins from here3 once it gets woken up for the first time. The very first thing we do is wait4 for the kthreadd task to be created: if we were to try and create a kthread before this, when the kthread creation mechanism tries to wake up the kthreadd task it would be using an uninitialised pointer, causing an oops. To prevent this, the init task waits on a completion object that the boot task marks completed after creating kthreadd. We could technically avoid this synchronization altogether just by creating kthreadd first, but then the init task wouldn't have PID 1.

The rest of the init task wraps up the initialisation stage as a whole. Mostly it moves the system into the 'running' state after freeing any memory marked as for initialisation only (set by __init annotations). Once fully initialised and running, the init task attempts to execute the userspace init program.

    if (ramdisk_execute_command) {
        ret = run_init_process(ramdisk_execute_command);
        if (!ret)
            return 0;
        pr_err("Failed to execute %s (error %d)\n",
               ramdisk_execute_command, ret);
    }

    if (execute_command) {
        ret = run_init_process(execute_command);
        if (!ret)
            return 0;
        panic("Requested init %s failed (error %d).",
              execute_command, ret);
    }

    if (CONFIG_DEFAULT_INIT[0] != '\0') {
        ret = run_init_process(CONFIG_DEFAULT_INIT);
        if (ret)
            pr_err("Default init %s failed (error %d)\n",
                   CONFIG_DEFAULT_INIT, ret);
        else
            return 0;
    }

    if (!try_to_run_init_process("/sbin/init") ||
        !try_to_run_init_process("/etc/init") ||
        !try_to_run_init_process("/bin/init") ||
        !try_to_run_init_process("/bin/sh"))
        return 0;

    panic("No working init found.  Try passing init= option to kernel. "
          "See Linux Documentation/admin-guide/init.rst for guidance.");

What file the init process is loaded from is determined by a combination of the system's filesystem, kernel boot arguments, and some default fallbacks. The locations it will attempt, in order, are:

  1. Ramdisk file set by rdinit= boot command line parameter, with default path /init. An initcall run earlier searches the boot arguments for rdinit and initialises ramdisk_execute_command with it. If the ramdisk does not contain the requested file, then the kernel will attempt to automatically mount the root device and use it for the subsequent checks.
  2. File set by init= boot command line parameter. Like with rdinit, the execute_command variable is initialised by an early initcall looking for init in the boot arguments.
  3. /sbin/init
  4. /etc/init
  5. /bin/init
  6. /bin/sh

Should none of these work, the kernel just panics. Which seems fair.

Aside: secondary processors

Until now we've focused on the boot CPU. While the utility of a task still applies to a uniprocessor system (perhaps even more so than one with hardware parallelism), a nice benefit of encapsulating all the execution state into a data structure is the ability to load the task onto any other compatible processor on the system. But before we can start scheduling on other CPU cores, we need to bring them online and initialise them to a state ready for the scheduler.

On the pSeries platform, the secondary CPUs are held by the firmware until explicitly released by the guest. Early in boot, the boot CPU (not task! We don't have tasks yet) will iterate the list of held secondary processors and release them one by one to the __secondary_hold function. As each starts executing __secondary_hold, it writes a value to the __secondary_hold_acknowledge variable that the boot CPU is watching. The secondary processor then immediately starts spinning on __secondary_hold_spinloop, waiting for it to become non-zero, while the boot CPU moves on to the the next processor.

// Boot CPU releasing the coprocessors from firmware

__start()
  __start_initialization_multiplatform()
    __boot_from_prom()
      prom_init()    // switched to C here
        prom_hold_cpus()
          // secondary_hold is alias for __secondary_hold assembly function
          call_prom("start-cpu", ..., secondary_hold, ...);  // on each coprocessor

Once every coprocessor is confirmed to be spinning on __secondary_hold_spinloop, the boot CPU continues on with its boot sequence. Once we reach setup_arch() as above, the boot task invokes smp_release_cpus() early in start_kernel(), which writes the desired entry point address of the coprocessors to __secondary_hold_spinloop. All the spinning coprocessors now see this value, and jump to it. This function, generic_secondary_smp_init(), will set up the coprocessor's PACA value, perform some machine specific initialisation if cur_cpu_spec->cpu_restore is set,5 atomically decrement a spinning_secondaries variable, and start spinning once again until further notice. This time it is waiting on the PACA field cpu_start, so we can start coprocessors individually.

We leave the coprocessors here for a while, until the init task calls kernel_init_freeable(). This function is used for any initialisation required after kthreads are running, but before all the __init sections are dropped. The setup relevant to coprocessors is the call to smp_init(). Here we fork the current task (the init task) once for each coprocessor with idle_threads_init(). We then call bringup_nonboot_cpus() to make each coprocessor start scheduling.

The exact code paths here are both deep and indirect, so here's the interesting part of the call tree for the pSeries platform to help guide you through the code.

// In the init task

smp_init()
  idle_threads_init()     // create idle task for each coprocessor
  bringup_nonboot_cpus()  // make each coprocessor enter the idle loop
    cpuhp_bringup_mask()
      cpu_up()
        _cpu_up()
          cpuhp_up_callbacks()  // invokes the CPUHP_BRINGUP_CPU .startup.single function
            bringup_cpu()
              __cpu_up()
                cpu_idle_thread_init()  // sets CPU's task in PACA to its idle task
                smp_ops->prepare_cpu()  // on pSeries inits XIVE if in use
                smp_ops->kick_cpu()     // indirect call to smp_pSeries_kick_cpu()
                  smp_pSeries_kick_cpu()
                    paca_ptrs[nr]->cpu_start = 1  // the coprocessor was spinning on this value

Interestingly, the entry point declared when cloning the init task for the coprocessors is never used. This is because the coprocessors never get woken up from the hand-crafted init state the way new tasks normally would. Instead they are already executing a code path, and so when they next yield they will just clobber the entry point and other registers with their actually running task state.

Transitioning the init task to userspace

The last remaining job of the kernel side of the init task is to actually load in and execute the selected userspace program. It's not like we can just call the userspace entry point though: we need to be a little creative here.

As alluded to above, when we create tasks with clone_thread(), it doesn't set the provided entry point directly: it instead sets a small shim that is actually used when the new task eventually gets woken up. The particular shim it uses is determined by whether the task is a kthread or not.

Both kinds of shim expect the requested entry point to be passed via a specific non-volatile register and, in the case of a kthread, basically just invokes it after some minor bookkeeping. A kthread should never return directly, so it traps if this happens.

_GLOBAL(start_kernel_thread)
    bl  CFUNC(schedule_tail)
    mtctr   r14
    mr  r3,r15
#ifdef CONFIG_PPC64_ELF_ABI_V2
    mr  r12,r14
#endif
    bctrl
    /*
     * This must not return. We actually want to BUG here, not WARN,
     * because BUG will exit the process which is what the kernel thread
     * should have done, which may give some hope of continuing.
     */
100:    trap
    EMIT_BUG_ENTRY 100b,__FILE__,__LINE__,0

But the init task isn't a kthread. We passed a kernel entrypoint to copy_thread() but did not set the kthread flag, so copy_thread() inferred that this means the task will eventually run in userspace. This makes it use the ret_from_kernel_user_thread() shim.

_GLOBAL(ret_from_kernel_user_thread)
    bl  CFUNC(schedule_tail)
    mtctr   r14
    mr  r3,r15
#ifdef CONFIG_PPC64_ELF_ABI_V2
    mr  r12,r14
#endif
    bctrl
    li  r3,0
    /*
     * It does not matter whether this returns via the scv or sc path
     * because it returns as execve() and therefore has no calling ABI
     * (i.e., it sets registers according to the exec()ed entry point).
     */
    b   .Lsyscall_exit

We start off identically to a kthread, except here we expect the task to return. This is the key: when the init task wants to transition to userspace, it sets up the stack frame as if we were serving a syscall. It then returns, which runs the syscall exit procedure that culminates in an rfid to userspace.

The actual setting up of the syscall frame is handled by the (try_)run_init_process() function. The interesting call path goes like

run_init_process()
  kernel_execve()
    bprm_execve()
      exec_binprm()
        search_binary_handler()
          list_for_each_entry(fmt, &formats, lh)
            retval = fmt->load_binary(bprm);

The outer few calls mainly handle checking prerequisites and bookkeeping. The exec_binrpm() call also handles shebang redirection, allowing up to 5 levels of interpreter. At each level it invokes search_binary_handler(), which attempts to find a handler for the program file's format. Contrary to the name, the searcher will also immediately try to load the file if it finds an appropriate handler. It's this call to load_binary (dispatched to whatever handler was found) that sets up our userspace execution context, including the syscall return state.

All that's left to do here is return 0 all the way up the chain, which you'll see results in the init task returning to the shim that performs the syscall return sequence to userspace. The init task is now fully userspace.

Creating other tasks

It feels like we've spent a lot of time discussing the init task. What about all the other tasks?

It turns out that the creation of the init task is very similar to any other task. All tasks are clones of the task that created them (except the statically defined init_task). Note 'clone' is being used in a loose sense here: it's not an exact image of the parent. There's a configuration parameter that determines which components are shared, and which are made into independent copies. The implementation may also just decide to change some things that don't make sense to duplicate, such as the task ID to distinguish it from the parent.

As we saw earlier, kthreads are created indirectly through a global list and kthreadd daemon task that does the actual cloning. This has two benefits: allowing asynchronous task creation from atomic contexts, and ensuring all kthreads inherit a 'clean' task context, instead of whatever was active at the time.

Userspace task creation, beyond the init task, is driven by the userspace process invoking the fork() and clone() family of syscalls. Both of these are light wrappers over the kernel_clone() function, which we used earlier for the creation of the init task and kthreadd.

When a task runs a syscall in the exec() family, it doesn't create a new task. It instead hits the same code path as when we tried to run the userspace init program, where it loads in the context as defined by the program file into the current task and returns from the syscall (legitimately this time).

Context switching

The last piece of the puzzle (as far as this article will look at!) is how tasks are switched in and out, and some of the rules around when it can and can't happen. Once the init and kthreadd tasks are created, we call cpu_startup_entry(CPUHP_ONLINE). Any coprocessors have also been released to call this by now too. Their tasks are repurposed to 'idle tasks', which serve to run when no other tasks are available to run. They will spin on a check for pending work, entering an idle state each loop until they see pending tasks to run. They then call __schedule() in a loop (also conditional on pending tasks existing), and then return back to the idle loop once everything in the moment is handled.

The __schedule() function is the main guts of the scheduler, which until now has seemed like some nebulous controller that's governing when and where our tasks run. In reality it isn't one isolated part of the system, but a function that a task calls when it decides to yield to any other waiting tasks. It starts by deciding which pending task should run (a whole can of worms right there), and then executing context_switch() if it changes from the current task. context_switch() is the point where the current task starts to change. Specifically, you can trace the changing of current (i.e., the PACA being updated with a new task pointer) to the following path

context_switch()
  switch_to()
    __switch_to()
      _switch()
        do_switch_64
          std   r6,PACACURRENT(r13)

One interesting consequence of tasks calling context_switch() is that the previous task is 'suspended'6 right where it saves its registers and puts in the new task's values. When it is woken up again at some point in the future it resumes right where it left off. So when you are reading the __switch_to() implementation, you are actually looking at two different tasks in the same function.

But it gets even weirder: while tasks that put themselves to sleep here wake up inside of _switch(), new tasks being woken up for the first time start at a completely different location! So not only is the task changing, the _switch() call might not even return back to __switch_to()!

Conclusion

And there you have it, everything[citation needed] you could ever need to know when getting started with tasks. Will you need to know this specifically? Hard to say. But hopefully it at least provides some useful pointers for understanding the execution model of the kernel.

Questions

The following are some questions you might have (read: I had).

Do we change the task struct when serving a syscall?

No, the task struct stays the same. The task struct declares it represents a userspace task, but it stays as the active task when serving syscalls or similar actions on behalf of its userspace execution.

Thanks to address space quadrants we don't even need to change the active memory mapping: upon entry to the kernel we automatically start using the PID 0 mapping.

Where does a task get allocated a PID?

Software PIDs are allocated when spawning a new process. However, if the process shares memory mappings with another (such as threads can), it may not be allocated a new hardware PID. Referring to the PID used for virtual memory translations, the hardware PID is actually a property of the memory mapping struct (mm_struct). You can find a hardware PID being allocated when a new mm_struct is created, which may or may not occur depending on the task clone parameters.

How can the fork and exec syscalls be hooked into for arch specific handling?

Fork (and clone) will always invoke copy_thread(). The exec call will invoke start_thread() when loading a binary file. Any other kind of file (script, binfmt-misc) will eventually require some form of binary file to load/bootstrap it, so start_thread() should work for your purposes. You can also use arch_setup_new_exec() for a cleaner hook into exec.

The task context of the calls is fairly predictable: current in copy_thread() refers to the parent because we are still in the middle of copying it. For start_thread(), current refers to the task that is going to be the new program because it is just configuring itself.

Where do exceptions/interrupts fit in?

When a hardware interrupt triggers it just stops whatever it was doing and dumps us at the corresponding exception handler. Our current value still points to whatever task is active (restoring the PACA is done very early). If we were in userspace (MSRPR was 1) we consider ourselves to be in 'process context'. This is, in some sense, the default state in the kernel. We are able to sleep (i.e., invoke the scheduler and swap ourselves out), take locks, and generally do anything you might like to do in the kernel. This is in contrast to 'atomic context', where certain parts of the kernel expect to be executed without interruption or sleeping.

However, we are a bit more restricted if we arrived at an interrupt from supervisor mode. For example, we don't know if we interrupted an atomic context, so we can't safely do anything that might cause sleep. This is why in some interrupt handlers like do_program_check() we have to check user_mode(regs) before we can read a userspace instruction7.


  1. Read more on tasks in my previous post 

  2. One example of the init task being special is that the kernel will not allow its process to be killed. It must always have at least one thread. 

  3. Well, it actually begins at a small assembly shim, but close enough for now. 

  4. The wait mechanism itself is an interesting example of interacting with the scheduler. Starting with a common struct completion object, the waiting task registers itself as awaiting the object to complete. Specifically, it adds its task handle to a queue on the completion object. It then loops calling schedule(), yielding itself to other tasks, until the completion object is flagged as done. Somewhere else another task marks the completion object as completed. As part of this, the task marking the completion tries to wake up any task that has registered itself as waiting earlier. 

  5. The cur_cpu_spec->cpu_restore machine specific initialisation is based on the machine that got selected in arch/powerpc/kernel/cpu_specs_book3s_64.h. This is where the __restore_cpu_* family of functions might be called, which mostly initialise certain SPRs to sane values. 

  6. Don't forget that the entire concept of tasks is made up by the kernel: from the hardware's point of view we haven't done anything interesting, just changed some registers. 

  7. The issue with reading a userspace instruction is that the page access may require the page be faulted in, which can sleep. There is a mechanism to disable the page fault handler specifically, but then we might not be able to read the instruction. 

,

Russell CokerThe Shape of Computers

Introduction

There have been many experiments with the sizes of computers, some of which have stayed around and some have gone away. The trend has been to make computers smaller, the early computers had buildings for them. Recently for come classes computers have started becoming as small as could be reasonably desired. For example phones are thin enough that they can blow away in a strong breeze, smart watches are much the same size as the old fashioned watches they replace, and NUC type computers are as small as they need to be given the size of monitors etc that they connect to.

This means that further development in the size and shape of computers will largely be determined by human factors.

I think we need to consider how computers might be developed to better suit humans and how to write free software to make such computers usable without being constrained by corporate interests.

Those of us who are involved in developing OSs and applications need to consider how to adjust to the changes and ideally anticipate changes. While we can’t anticipate the details of future devices we can easily predict general trends such as being smaller, higher resolution, etc.

Desktop/Laptop PCs

When home computers first came out it was standard to have the keyboard in the main box, the Apple ][ being the most well known example. This has lost popularity due to the demand to have multiple options for a light keyboard that can be moved for convenience combined with multiple options for the box part. But it still pops up occasionally such as the Raspberry Pi 400 [1] which succeeds due to having the computer part being small and light. I think this type of computer will remain a niche product. It could be used in a “add a screen to make a laptop” as opposed to the “add a keyboard to a tablet to make a laptop” model – but a tablet without a keyboard is more useful than a non-server PC without a display.

The PC as “box with connections for keyboard, display, etc” has a long future ahead of it. But the sizes will probably decrease (they should have stopped making PC cases to fit CD/DVD drives at least 10 years ago). The NUC size is a useful option and I think that DVD drives will stop being used for software soon which will allow a range of smaller form factors.

The regular laptop is something that will remain useful, but the tablet with detachable keyboard devices could take a lot of that market. Full functionality for all tasks requires a keyboard because at the moment text editing with a touch screen is an unsolved problem in computer science [2].

The Lenovo Thinkpad X1 Fold [3] and related Lenovo products are very interesting. Advances in materials allow laptops to be thinner and lighter which leaves the screen size as a major limitation to portability. There is a conflict between desiring a large screen to see lots of content and wanting a small size to carry and making a device foldable is an obvious solution that has recently become possible. Making a foldable laptop drives a desire for not having a permanently attached keyboard which then makes a touch screen keyboard a requirement. So this means that user interfaces for PCs have to be adapted to work well on touch screens. The Think line seems to be continuing the history of innovation that it had when owned by IBM. There are also a range of other laptops that have two regular screens so they are essentially the same as the Thinkpad X1 Fold but with two separate screens instead of one folding one, prices are as low as $600US.

I think that the typical interfaces for desktop PCs (EG MS-Windows and KDE) don’t work well for small devices and touch devices and the Android interface generally isn’t a good match for desktop systems. We need to invent more options for this. This is not a criticism of KDE, I use it every day and it works well. But it’s designed for use cases that don’t match new hardware that is on sale. As an aside it would be nice if Lenovo gave samples of their newest gear to people who make significant contributions to GUIs. Give a few Thinkpad Fold devices to KDE people, a few to GNOME people, and a few others to people involved in Wayland development and see how that promotes software development and future sales.

We also need to adopt features from laptops and phones into desktop PCs. When voice recognition software was first released in the 90s it was for desktop PCs, it didn’t take off largely because it wasn’t very accurate (none of them recognised my voice). Now voice recognition in phones is very accurate and it’s very common for desktop PCs to have a webcam or headset with a microphone so it’s time for this to be re-visited. GPS support in laptops is obviously useful and can work via Wifi location, via a USB GPS device, or via wwan mobile phone hardware (even if not used for wwan networking). Another possibility is using the same software interfaces as used for GPS on laptops for a static definition of location for a desktop PC or server.

The Interesting New Things

Watch Like

The wrist-watch [4] has been a standard format for easy access to data when on the go since it’s military use at the end of the 19th century when the practical benefits beat the supposed femininity of the watch. So it seems most likely that they will continue to be in widespread use in computerised form for the forseeable future. For comparison smart phones have been in widespread use as “pocket watches” for about 10 years.

The question is how will watch computers end up? Will we have Dick Tracy style watch phones that you speak into? Will it be the current smart watch functionality of using the watch to answer a call which goes to a bluetooth headset? Will smart watches end up taking over the functionality of the calculator watch [5] which was popular in the 80’s? With today’s technology you could easily have a fully capable PC strapped to your forearm, would that be useful?

Phone Like

Folding phones (originally popularised as Star Trek Tricorders) seem likely to have a long future ahead of them. Engineering technology has only recently developed to the stage of allowing them to work the way people would hope them to work (a folding screen with no gaps). Phones and tablets with multiple folds are coming out now [6]. This will allow phones to take much of the market share that tablets used to have while tablets and laptops merge at the high end. I’ve previously written about Convergence between phones and desktop computers [7], the increased capabilities of phones adds to the case for Convergence.

Folding phones also provide new possibilities for the OS. The Oppo OnePlus Open and the Google Pixel Fold both have a UI based around using the two halves of the folding screen for separate data at some times. I think that the current user interfaces for desktop PCs don’t properly take advantage of multiple monitors and the possibilities raised by folding phones only adds to the lack. My pet peeve with multiple monitor setups is when they don’t make it obvious which monitor has keyboard focus so you send a CTRL-W or ALT-F4 to the wrong screen by mistake, it’s a problem that also happens on a single screen but is worse with multiple screens. There are rumours of phones described as “three fold” (where three means the number of segments – with two folds between them), it will be interesting to see how that goes.

Will phones go the same way as PCs in terms of having a separation between the compute bit and the input device? It’s quite possible to have a compute device in the phone form factor inside a secure pocket which talks via Bluetooth to another device with a display and speakers. Then you could change your phone between a phone-size display and a tablet sized display easily and when using your phone a thief would not be able to easily steal the compute bit (which has passwords etc). Could the “watch” part of the phone (strapped to your wrist and difficult to steal) be the active part and have a tablet size device as an external display? There are already announcements of smart watches with up to 1GB of RAM (same as the Samsung Galaxy S3), that’s enough for a lot of phone functionality.

The Rabbit R1 [8] and the Humane AI Pin [9] have some interesting possibilities for AI speech interfaces. Could that take over some of the current phone use? It seems that visually impaired people have been doing badly in the trend towards touch screen phones so an option of a voice interface phone would be a good option for them. As an aside I hope some people are working on AI stuff for FOSS devices.

Laptop Like

One interesting PC variant I just discovered is the Higole 2 Pro portable battery operated Windows PC with 5.5″ touch screen [10]. It looks too thick to fit in the same pockets as current phones but is still very portable. The version with built in battery is $AU423 which is in the usual price range for low end laptops and tablets. I don’t think this is the future of computing, but it is something that is usable today while we wait for foldable devices to take over.

The recent release of the Apple Vision Pro [11] has driven interest in 3D and head mounted computers. I think this could be a useful peripheral for a laptop or phone but it won’t be part of a primary computing environment. In 2011 I wrote about the possibility of using augmented reality technology for providing a desktop computing environment [12]. I wonder how a Vision Pro would work for that on a train or passenger jet.

Another interesting thing that’s on offer is a laptop with 7″ touch screen beside the keyboard [13]. It seems that someone just looked at what parts are available cheaply in China (due to being parts of more popular devices) and what could fit together. I think a keyboard should be central to the monitor for serious typing, but there may be useful corner cases where typing isn’t that common and a touch-screen display is of use. Developing a range of strange hardware and then seeing which ones get adopted is a good thing and an advantage of Ali Express and Temu.

Useful Hardware for Developing These Things

I recently bought a second hand Thinkpad X1 Yoga Gen3 for $359 which has stylus support [14], and it’s generally a great little laptop in every other way. There’s a common failure case of that model where touch support for fingers breaks but the stylus still works which allows it to be used for testing touch screen functionality while making it cheap.

The PineTime is a nice smart watch from Pine64 which is designed to be open [15]. I am quite happy with it but haven’t done much with it yet (apart from wearing it every day and getting alerts etc from Android). At $50 when delivered to Australia it’s significantly more expensive than most smart watches with similar features but still a lot cheaper than the high end ones. Also the Raspberry Pi Watch [16] is interesting too.

The PinePhonePro is an OK phone made to open standards but it’s hardware isn’t as good as Android phones released in the same year [17]. I’ve got some useful stuff done on mine, but the battery life is a major issue and the screen resolution is low. The Librem 5 phone from Purism has a better hardware design for security with switches to disable functionality [18], but it’s even slower than the PinePhonePro. These are good devices for test and development but not ones that many people would be excited to use every day.

Wwan hardware (for accessing the phone network) in M.2 form factor can be obtained for free if you have access to old/broken laptops. Such devices start at about $35 if you want to buy one. USB GPS devices also start at about $35 so probably not worth getting if you can get a wwan device that does GPS as well.

What We Must Do

Debian appears to have some voice input software in the pocketsphinx package but no documentation on how it’s to be used. This would be a good thing to document, I spent 15 mins looking at it and couldn’t get it going.

To take advantage of the hardware features in phones we need software support and we ideally don’t want free software to lag too far behind proprietary software – which IMHO means the typical Android setup for phones/tablets.

Support for changing screen resolution is already there as is support for touch screens. Support for adapting the GUI to changed screen size is something that needs to be done – even today’s hardware of connecting a small laptop to an external monitor doesn’t have the ideal functionality for changing the UI. There also seem to be some limitations in touch screen support with multiple screens, I haven’t investigated this properly yet, it definitely doesn’t work in an expected manner in Ubuntu 22.04 and I haven’t yet tested the combinations on Debian/Unstable.

ML is becoming a big thing and it has some interesting use cases for small devices where a smart device can compensate for limited input options. There’s a lot of work that needs to be done in this area and we are limited by the fact that we can’t just rip off the work of other people for use as training data in the way that corporations do.

Security is more important for devices that are at high risk of theft. The vast majority of free software installations are way behind Android in terms of security and we need to address that. I have some ideas for improvement but there is always a conflict between security and usability and while Android is usable for it’s own special apps it’s not usable in a “I want to run applications that use any files from any other applicationsin any way I want” sense. My post about Sandboxing Phone apps is relevant for people who are interested in this [19]. We also need to extend security models to cope with things like “ok google” type functionality which has the potential to be a bug and the emerging class of LLM based attacks.

I will write more posts about these thing.

Please write comments mentioning FOSS hardware and software projects that address these issues and also documentation for such things.

,

Russell CokerAndroid vs FOSS Phones

To achieve my aims regarding Convergence of mobile phone and PC [1] I need something a big bigger than the 4G of RAM that’s in the PinePhone Pro [2]. The PinePhonePro was released at the end of 2021 but has a SoC that was first released in 2016. That SoC seems to compare well to the ones used in the Pixel and Pixel 2 phones that were released in the same time period so it’s not a bad SoC, but it doesn’t compare well to more recent Android devices and it also isn’t a great fit for the non-Android things I want to do. Also the PinePhonePro and Librem5 have relatively short battery life so reusing Android functionality for power saving could provide a real benefit. So I want a phone designed for the mass market that I can use for running Debian.

PostmarketOS

One thing I’m definitely not going to do is attempt a full port of Linux to a different platform or support of kernel etc. So I need to choose a device that already has support from a somewhat free Linux system. The PostmarketOS system is the first I considered, the PostmarketOS Wiki page of supported devices [3] was the first place I looked. The “main” supported devices are the PinePhone (not Pro) and the Librem5, both of which are under-powered. For the “community” devices there seems to be nothing that supports calls, SMS, mobile data, and USB-OTG and which also has 4G of RAM or more. If I skip USB-OTG (which presumably means I’d have to get dock functionality via wifi – not impossible but not great) then I’m left with the SHIFT6mq which was never sold in Australia and the Xiomi POCO F1 which doesn’t appear to be available on ebay.

LineageOS

The libhybris libraries are a compatibility layer between Android and glibc programs [4]. Which includes running Wayland with Android display drivers. So running a somewhat standard Linux desktop on top of an Android kernel should be possible. Here is a table of the LineageOS supported devices that seem to have a useful feature set and are available in Australia and which could be used for running Debian with firmware and drivers copied from Android. I only checked LineageOS as it seems to be the main free Android build.

Phone RAM External Display Price
Edge 20 Pro [5] 6-12G HDMI $500 not many on sale
Edge S aka moto G100 [6] 6-8G HDMI $500 to $600+
Fairphone 4 6-8G USBC-DP $1000+
Nubia Red Magic 5G 8-16G USBC-DP $600+

The LineageOS device search page [9] allows searching by kernel version. There are no phones with a 6.6 (2023) or 6.1 (2022) Linux kernel and only the Pixel 8/8Pro and the OnePlus 11 5G run 5.15 (2021). There are 8 Google devices (Pixel 6/7 and a tablet) running 5.10 (2020), 18 devices running 5.4 (2019), and 32 devices running 4.19 (2018). There are 186 devices running kernels older than 4.19 – which aren’t in the kernel.org supported release list [10]. The Pixel 8 Pro with 12G of RAM and the OnePlus 11 5G with 16G of RAM are appealing as portable desktop computers, until recently my main laptop had 8G of RAM. But they cost over $1000 second hand compared to $359 for my latest laptop.

Fosdem had an interesting lecture from two Fairphone employees about what they are doing to make phone production fairer for workers and less harmful for the environment [11]. But they don’t have the market power that companies like Google have to tell SoC vendors what they want.

IP Laws and Practices

Bunnie wrote an insightful and informative blog post about the difference between intellectual property practices in China and US influenced countries and his efforts to reverse engineer a commonly used Chinese SoC [12]. This is a major factor in the lack of support for FOSS on phones and other devices.

Droidian and Buying a Note 9

The FOSDEM 2023 has a lecture about the Droidian project which runs Debian with firmware and drivers from Android to make a usable mostly-FOSS system [13]. It’s interesting how they use containers for the necessary Android apps. Here is the list of devices supported by Droidian [14].

Two notable entries in the list of supported devices are the Volla Phone and Volla Phone 22 from Volla – a company dedicated to making open Android based devices [15]. But they don’t seem to be available on ebay and the new price of the Volla Phone 22 is E452 ($AU750) which is more than I want to pay for a device that isn’t as open as the Pine64 and Purism products. The Volla Phone 22 only has 4G of RAM.

Phone RAM Price Issues
Note 9 128G/512G 6G/8G <$300 Not supporting external display
Galaxy S9+ 6G <$300 Not supporting external display
Xperia 5 6G >$300 Hotspot partly working
OnePlus 3T 6G $200 – $400+ photos not working

I just bought a Note 9 with 128G of storage and 6G of RAM for $109 to try out Droidian, it has some screen burn but that’s OK for a test system and if I end up using it seriously I’ll just buy another that’s in as-new condition. With no support for an external display I’ll need to setup a software dock to do Convergence, but that’s not a serious problem. If I end up making a Note 9 with Droidian my daily driver then I’ll use the 512G/8G model for that and use the cheap one for testing.

Mobian

I should have checked the Mobian list first as it’s the main Debian variant for phones.

From the Mobian Devices list [16] the OnePlus 6T has 8G of RAM or more but isn’t available in Australia and costs more than $400 when imported. The PocoPhone F1 doesn’t seem to be available on ebay. The Shift6mq is made by a German company with similar aims to the Fairphone [17], it looks nice but costs E577 which is more than I want to spend and isn’t on the officially supported list.

Smart Watches

The same issues apply to smart watches. AstereoidOS is a free smart phone OS designed for closed hardware [18]. I don’t have time to get involved in this sort of thing though, I can’t hack on every device I use.

,

Michael StillShift

This is the second book in the Silo series, following on the Wool, which I recently read. I think to a certain extend this book is better than the first one — I certainly found it compelling. An excellent read that explains how the universe described in Wool came to be, but yet also sets the scene for the third book in the trilogy.

Shift Book Cover Shift
Hugh Howey
Dystopias
Penguin Group
April 13, 2023

The much anticipated prequel to bestseller Wool that takes us back to the beginnings of the silo. In a future less than fifty years away, the world is still as we know it. Time continues to tick by. The truth is that it is ticking away. A powerful few know what lies ahead. They are preparing for it.

,

Simon LyallThe 10 Thickest Books I Own

Prompted by a comment from someone I present below the 10 thickest books in my personal library. I made no correction for hardcover vs softcover. Measured at center of book with mild compression

The Books in order. 10th thickest left, thickest on right

My top 10 books ended up being a bit of a mix

  • Three Fiction: 1 Science Fiction, 1 Fantasy, 1 annotated detective series
  • One giant book of Chess puzzles
  • Two books about lots of things. 500 Villages and 100 Museum Objects
  • Two biographic books about a National Leader during wartime
  • A book of social history
  • A book looking at big trends in all recorded history

The Countdown

10th – 58mm – The New Annotated Sherlock Holmes. Volume 2. 1878 pages. Softcover

The tallest, widest and book with the most pages. Has the original text in the centre with notes on the outside and lots of illustrations

9th – 60mm – Villages of Britain by Clive Aslet. 658 pages. Hardcover

1-2 pages on 500 English villages. Usually covers an interesting feature, event or person

8th – 61mm – Team of Rivals by Doris Kearns Goodwin. 916 pages. Softcover

A book on Abraham Lincoln’s Cabinet. Basis for the movie “Lincoln” and my book is a movie branded version

7th – 61mm – The Lord of the Rings by J R R Tolkien. 1192 pages. Softcover

My much battered single volume edition I’ve had since I was a kid.

6th – 63mm – Chess 5334 Problems, Combinations, and Games by Laszlo Polgar. 1104 pages. Softcover

Mostly pictures of chess positions (6 per page) and the solution. Almost no words

5th – 64mm – A History of the World in 100 Objects. 707 pages – Hardcover

Based on a Radio Series. Each object has a couple of very nice photos and then around 3 pages of text about it and where it came from. Very nice book.

4th – 65mm – Seveneves by Neal Stephenson. 867 pages. Hardcover

A Science Fiction book about what happens when the Moon blows up.

3rd – 66mm – Why the West Rules – For now. 750 pages. Hardcover

A big idea history book with speculation about the future.

2nd – 69mm – Road to Victory. Winston S. Churchill 1941-1945 by Martin Gilbert. 1416 pages. Hardcover

Part of the huge 8 volume official biography of Churchill. Covering Pearl Harbor to VE Day.

1st – 72mm – Family Britain 1951-57 by David Kynaston. 776 pages. Hardcover

Part of an ongoing series of books about the social history of Britain from 1945 to 1979. Covers a lot of ordinary lives and major events are often seen via individual’s reactions rather than being covered directly.

Share

,

Michael StillSolve for Happy

Mo Gawdat was kind of a big deal, at IBM, Microsoft, and then Google. But he was unhappy, so he decided to take an engineering approach and try to systematically “solve for happy” and work out why adding more money, shiny objects, and adoration of others didn’t actually make him happy.

When I was walking in Memphis
I was walking with my feet ten feet off of Beale
Walking in Memphis
But do I really feel the way I feel?

— Walking in Memphis, Marc Cohn

Gawdat argues that much of the narrative we all experience in our heads is a biological function from long ago, designed to help us identify and avoid threats. In fact, he argues that it is now often counter productive. As a solution, he proposes four techniques to tame your inner monologue:

  • Observe, but don’t participate in the dialog. That is, acknowledge thoughts that affect your happiness when they happen, but don’t participate in a dialog about the thought, just note it and move on.
  • Sometimes a thought will “stick” and you’ll end up in a loop. In those cases, consider the cause of the drama — what is the underlying issue that is stopping you from moving on? What train of thought brought you to it now? That creates an opportunity to reframe the thought, perhaps not as a positive one but at least one you can acknowledge and move on from.
  • Try to move on to thinking about something else. Gawdat describes a “priming technique” where you focus on something else for a few moments and it helps you break out of the cyclical thought. This is interesting to me, because I sometimes “get words stuck in my head” — the elogy for a friend; an important work email; and so forth. I find that writing those words down helps in those cases because I think they get stuck because I am worried I will forget them. The thought loop feels a bit like that too, the brain’s way of ensuring that we don’t forget something which seems very important.
  • Calm your thoughts in general, that is, meditate. He has an interesting suggestion here — just stop and observe the world around you. What is notable and interesting at the moment? What is unusual?

Gawdat also makes the point that there is more to you than just your thoughts. He works through a series of questions around what defines you as a person, applying two tests — if you can observe a thing, then that thing is not you; and if a thing can change and leave you unchanged, then that thing is not you. He reaches an interesting conclusion there but I wont ruin it for you.

This however is one of the areas where Gawdat makes readers uncomfortable according to many of the reviews of this book that I have read. This is because he deviates from a relatively concrete and well referenced discussion of how to self manage your thought processes, into a discussion of whether or not an afterlife exists. He returns to this theme later in the book as well.

Gawdat then moves on to asking you to assess what is actually important to you. What are you doing because of how other people see you (or how you want other people to see you) versus things that actually make you happy. He also reminds us that while we might feel like the main character in the movies of our lives, we are only supporting actors in everyone else’s life movie.

To be honest, this is about the point where Gawdat loses me a bit — he enters into a discussion of the meaning of time. The point he’s trying to make is that time is arbitrary and we should choose to be more patient and to live in the moment, but its a meandering path to get there. This point is then reinforced by discussing how we don’t control the things around us — the only things in our control are our own actions and attitude, so if we let things ruin our day then we’ve let go of the one thing we control. Carpe Diem I suppose?

Gawdat also asserts that our memories of prior events taint our interpretation of current and future events, which seems fair to me, but that these memories are also summaries not completely accurate records and that the brain is more likely to record negative memories than positive ones. He also asserts that we have a tendency to exaggerate risks when considering options. While this might be true for some people, I have definitely met people who seem incapable of estimating risks at all so its certainly not true for everyone. Either way, he encourages the reader to try to be rational about the actual risks presented by your inner monolog.

We are encouraged to focus on those less fortunate that you when you’re unhappy, which seems like reasonable if sometimes unsafe advice. If you’re being abused, don’t look for reasons to be happy compared to others! So only consider this advice if your basic life needs — food, shelter, safety, et cetera are being met.

This book is largely an exploration of the process Gawdat went through when processing the unexpected loss of his son during routine surgery, which is a recurrent theme in the book. While that is sometimes a helpful checkpoint, it also causes Gawdat to veer off into a discussion of the metaphysical for the last couple of chapers of the book. This is a frequent source of complaints in other online reviews of the book, but I’d say just skip those chapters if they bother you.

Overall I’d say this book was ok, but not great. It makes some interesting points about how much we should trust our inner monologue to be right, but I feel it could have made those points in a much more terse manner without losing anything. While helpful, the book lacks sufficient supporting research for the field it plays in I suspect.

Solve for Happy Book Cover Solve for Happy
Mo Gawdat
January 10, 2019
368

Solve for Happy is a startlingly original book about creating and maintaining happiness, written by a top Google executive with an engineer's training and fondness for thoroughly analyzing a problem.

,

Simon LyallAudiobooks – February 2024

Hollywood and the Movies of the Fifties: The Collapse of the Studio System, the Thrill of Cinerama, and the Invasion of the Ultimate Body Snatcher—Television by Foster Hirsch

Excellent. Highly recommend. 4/5

Flying Blind: The 737 MAX Tragedy and the Fall of Boeing by Peter Robinson

Covers the plane and the corporate culture at Boeing that lead to the failure. Well sourced and Interesting. 3/5

1177 B.C. The Year Civilization Collapsed by Eric H. Cline

An overview of the civilisations of Eastern Mediterranean before and possible causes of the Late Bronze Age collapse. 3/5

The Spice Must Flow: The Story of Dune, from Cult Novels to Visionary Sci-Fi Movies by Ryan Britt

Pretty good history of the books and TV/Movie adaptions including a little on the 1st Villeneuve movie. Fun with lots of quotes and seems well researched. 4/5

How to Fight a War by Mike Martin

Uses simple language it works though the complexity of modern warfare, addressed to an imaginary political leader. Recent enough to include lessons from Russian’s invasion of Ukraine. Highly recommend 5/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Michael StillStarter Villain

Now, I might be biased because I like John Scalzi’s stuff, but this book was really good. It starts slower than a normal Scalzi book, and takes a couple of chapters to really get going, but I am glad I was patient with it. Apart from that its a quick easy read. Its a typical Scalzi book, light hearted and fun. I think this one requires you suspend disbelief a little harder than others (except perhaps for Redshirts) but that doesn’t make it less enjoyable.

Starter Villain Book Cover Starter Villain
John Scalzi
September 21, 2023
262

,

Russell CokerLinks February 2024

In 2018 Charles Stross wrote an insightful blog post Dude You Broke the Future [1]. It covers AI in both fiction and fact and corporations (the real AIs) and the horrifying things they can do right now.

LongNow has an interesting article about the concept of the Magnum Opus [2]. As an aside I’ve been working on SE Linux for 22 years.

Cory Doctorow wrote an insightful article about the incentives for enshittification of the Internet and how economic issues and regulations shape that [3].

CCC has a lot of great talks, and this talk from the latest CCC about the Triangulation talk on an attak on Kaspersky iPhones is particularly epic [4].

GoodCar is an online sales site for electric cars in Australia [5].

Ulrike wrote an insightful blog post about how the reliance on volunteer work in the FOSS community hurts diversity [6].

Cory Doctorow wrote an insightful article about The Internet’s Original Sin which is misuse of copyright law [7]. He advocates for using copyright strictly for it’s intended purpose and creating other laws for privacy, labor rights, etc.

David Brin wrote an interesting article on neoteny and sexual selection in humans [8].

37C3 has an interesting lecture about software licensing for a circular economy which includes environmental savings from better code [9]. Now they track efficiency in KDE bug reports!

Linux Australia2024-02-28 Council Meeting Minutes

1. Meeting overview and key information

Present

  • Sae Ra Germaine (Vice-President)
  • Neill Cox (Secretary)
  • Andrew Pam (Council)
  • Russell Stuart (Treasurer)
  • Jennifer Cox (Council)
  • Jonathan Woithe (Council)

Apologies 

  • Joel Addison (President)

Not Present

Meeting opened at 20:00 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Neill

2. Log of correspondence

  • Add Authoriser to ANZ Direct Online – 417289 Linux Australia (Incorporated)

Ideally all council members would be authorised. To be authroised they will need copies/scans of a passport and a utility bill with an address. Take them to a JP to certify. Then scan and email to ANZ NZ. Russell will email instructions to the council members.

  • Possible donation to Linux Australia from disincorporation of Melbourne Functional Programming Association Incorporated

The money has been deposited. Linux Australia will consider an appropriate recognition.

  • Inquiry: running an information security conference (Purplecon)

The organisers are invited to submit a proposal and budget. Russell will email them.

  • OSI at FOSDEM, Open Source AI draft 0.5
  • URGENT Re: pyconau 2024 sub-committee request

The urgent PyConAU matter has been dealt with (it was about signing the venue contract).

  • New MoU with WordCamp Central

Sae Ra will sign the new MoU. It is functionally identical to the existing contract and makes no new commitments.

  • Please authorise my request for ANZ mobile app
  • WordCamp Sydney 2024 – Request to form a LA subcommittee

Proposal is still being discussed. One surprise is that the expected number of attendees is higher than the previous pre-covid conference.

  • Trialling Wise
  • DrupalAsia / DrupalCon Asia
  • Issue with contact form on LA website – investigations and possible way forward
  • Seeking Kiwi PyCon and NZ PUG information for the Linux Aus website
  • The current process needed for ANZ
  • Payment requests for DrupalSouth

3. Items for discussion

  • OSI board nomination

Miles Goodhew has agreed to be nominated. A draft candidacy statement was provided to the council.

Motion: That Linux Australia accept and recommend Miles Goodhew as our nominee for the OSI board elections
Moved: Sae Ra Germaine
Seconded: Russell Stuart
Result: Passed unanimously

  • WordCamp Subcommittee

Motion: That Linux Australia approves the formation of a WordCamp Subcommittee to run WordCamp 2024 with the following members:

  • Wil Brown – Lead Organiser (Chair)
  • Jordan Gillman – Organiser/Treasurer ( Treasurer)
  • Dee Teal – Organiser/Program Manager
  • Jo Minney – Organiser/Speaker Lead
  • Eva Devos – Organiser/Swag & Merch/WPSyd Community Rep
  • Sam Toohey  – Organiser/After Party/WPSyd Community Rep

Moved: Sae Ra Germaine
Seconded: Andrew Pam
Result: Passed unanimously

4. Items for noting

5. Other business

  • Drupal sub committee update

Drupal South in three weeks! Good keynotes lined up. Second keynote is a Girls In Tech panel. Ticket sales are still coming in. Currently at 190 attendees. The expectation is  to end up somewhere over 200. Have hit revenue targets for tickets, but sponsorship is down. It’s particularly difficult for NZ companies this year, and there are headwinds for tech in general. Currently looks like a 25k loss for the conference. Twice as many entries for the awards show. Many first time attendees, more clients than developers which has been a goal.

One proposal for future conferences is to consider  ways to promote Drupal outside of the existing community by having a Drupal booth at other conferences e.g. FST Government Summits, Everything Open, Tech in Gov, EduTECH

Drupal Asia – Mike has been working with the Drupal Association to use the DrupalCon brand for the Asian Drupal conference. The Drupalm Association does not want to be the financial sponsor. Their preference would be to work with Linux Australia and licence the brand to Linux Australia. Licensing the brand would increase the chances of a successful conference. Mike would like to start a conversation with him, Linux Australia and the Drupal Association. From Linux Australia’s perspective this should basically fit into the template that we currently use for running conferences. We would need a budget that would account for the licensing fees, and some way of dealing with payments in whatever country/currency the conference is run in. Linux Australia will also need to make sure we can comply with any legal or taxation requirements. The first conference will most likely be in Singapore.

  • Admin team update

Working on the new list server. Seeing weird stuff with Fastmail, but it can’t be definitively blamed on Fastmail. Something seems to be happening before the mail gets to Fastmail. The new listserver will go live this weekend. The new server will have a much cleaner config and will be easier to troubleshoot the problems.

The admin team have also been working on the web server upgrade.

Next in the queue is looking at the VM servers.

Also need to make sure we have correct contact details for any subcommittee servers running on Linux Australia infrastructure.

  • Joomla sub committee update

THe conference is going well. Should make the minimum attendance budget, but may not get much more than that. Currently there are 14 confirmed attendees out of 20. There is an unexpected bonus sponsor. Several panel talks are planned as well as straight presentations.

The conference is now about two and a half weeks away.

  • PyCon AU sub committee update

Contract with MCEC has been signed by Joel, and payment has been made. The financial induction still needs to be done. Clinton will chase people up for that. There is some wriggle room in the contract for catering. A decision on that will be held until the number of attendees is known.

  • Flounder sub committee update

Has been meeting monthly. Low turnout, but different people and topics. People from both Australia and New Zealand, and even China. Mostly focussed on phones and mobile devices.

  • LUV  sub committee update

Going quite well.  Two online meetings and one physical meeting each month. One social meeting, the regular LUV meetings and an in person technical support meeting. Generally around six people at each meeting.

LUV still doesn’t have a definite plan for holding an election, but is aware that it should happen soon. Andrew will poke Alexar about this.

  • WordPress sub committee update

An invitation was not sent to the WordCamp organisers because of an oversight by the Secretary

6. In camera

  • One item was discussed in camera

7. Action items

  • Setup WordCamp Subcommittee Invite – Neill Cox

7.1 Completed Items

  • Calendar invites for subcommittee meetings – Neill Cox

7.2 Carried Forward

 

Meeting closed at 21:37

Next meeting is scheduled for 2024-03-13

The post 2024-02-28 Council Meeting Minutes appeared first on Linux Australia.

Linux Australia2024-02-14 Council Meeting Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Sae Ra Germaine (Vice-President)
  • Neill Cox (Secretary)
  • Andrew Pam (Council)
  • Russell Stuart (Treasurer)
  • Jennifer Cox (Council)
  • Jonathan Woithe (Council)

 

Apologies 

 

Not Present

 

Meeting opened at 20:06 AEDT by Joel  and quorum was achieved.

Minutes taken by Neill

 

2. Log of correspondence

  • MHW Best Practice Program- Letter of Consent  [Julie Daly]
  •  linux.org.au now has an Everything Open page [Kathy Reid]
  • Reconnect with Joomla Event
  • Open source Event [Carlos.Noschang Kuhn]
  • APNIC Membership Renewal – Invoice Attached [Steve Walsh]
  • other issue re: past member ROHAN McLEOD [Russell Coker]
  • Undelivered Mail Returned to Sender [Russell Coker] – Sae Ra has responded
  • Stripe account [Danny W. Adair] – Russell has responded
  • How to document budget adjustments [Danny W. Adair] – Russell has responded
  • May I assign LA as registrant of drupal.org.au? [Si Hobbs] – Neill sent the requested letter
  • joomla event / induction [Stuart Robertson] – Russell has responded
  • pyconau 2024 sub-committee request [Christopher Neugebauer]
  • Call for OSI board nomination [OSI Elections team]
  • bitwarden.com verification emails have gone AWOL [Steve Walsh]
  • Issue with contact form on LA website – investigations and possible way forward [Kathy Reid] – Joel has responded
  • Change to auDA domain name rules – request to make submission under LA banner [Kathy Reid] – Jonathan has responded
  • DrupalSouth coming up, and a few other proposals [David Sparks]
  • KPC 24 / Payment in excess of NZD5000 [Richard Shea] – Russell has responded
  • Possible donation to Linux Australia from disincorporation of Melbourne Functional Programming Association Incorporated [Les Kitchen] – Joel has responded
  • Approval for Xero invoice template [Nathan Morrow]

 

3. Items for discussion

  • Drupal South

May not make as much money as initially hoped. David would like to speak to the council about a few things. It would probably help if the Drupal South website had some text explaining why it would be good to attend the conference. An example from Everything Open: https://2024.everythingopen.au/attend/why-should-employees-attend/

  • PyConAU

Joel is a signatory for the contract. Chris is discussing one of the points in the contract, which may impact the budget.. We’re waiting for resolution of that before signing the contract.

The budget looks reasonable.

Russell has reviewed the budget. It seems conservative in regards to ticket sales and sponsorship, so the expected result will probably be better than what is outlined in the budget.

 

Motion: To establish PyConAU2024 as an event subcommittee in line with the budget provided

Moved: Joel

Seconded: Russell

Passed unanimously

 

  • OSI board nomination

We have been offered a chance to nominate for their board. Sadly, there is no one on the council who has enough time to take up the offer.

  • Open source Event (EO in Canberra)

Sae Ra will discuss this with Chris when possible. It’s from the … 2025 is not possible, but maybe they would like to bid for 2026.

  • Goals:
    • Neill: Improve communication with subcommittees. Support and encourage more in person activities.
    • Russell: Sorting out the New Zealand situation. Helping with EO.
    • Jonathan: Keep working on the grants program and related activities. Encourage community participation and the next generation of community members.
    • Andrew: Joined to provide some choice of council members and felt that I had something to offer at a national level. Goals: Capitalise on the new Everything Open name/brand. Use it to engage new people, expand the reach of the community and broaden our remit. Also, a focus on COVID safety at our events, including hybrid and remote attendance.
    • Jenny: Keen to put forward the point of a view of the less technical members. Explain what open source offers to people who are in the non-tech community. Encourage participation from women in our community. Engage with government agencies using open source as part of their programmes.
    • Sae Ra: Document Everything Open so she can just be an attendee one year. Begin a discussion of what needs to be done in the way of updating our constitution. Look at how we do awards to recognise effort done by people in support of our community.
    • Joel: Focus on portfolios for council members, picking up on the work done by Sae Ra two years ago (https://linux.org.au/wp-content/uploads/2021/12/Council-Portfolios-2021.pdf) . Do a “stocktake” of our subcommittees. Fix some problems with the website (especially the election system). Updating the constitution. Start working on a five year strategic plan in consultation with the other council members.
  • Sorting out NZ banking

We need some more people who can authorise payments in New Zealand. We also have some historical signatories that need to be rationalised. These are suspended signatories, who can be reactivated in the future if needed.

 

4. Items for noting

  • NSW Fair Trading return submitted
  • ASIC FORM 490 Change of Directors submitted
  • Stuart Robertson Financial (Joomla Reconnect 24) induction done.

 

5. Other business

6. In camera

7. Action items

  • Calendar invites for subcommittee meetings – Neill Cox

7.1 Completed Items

7.2 Carried Forward

 

Meeting closed at 20:51

Next meeting is scheduled for 2024-02-28

The post 2024-02-14 Council Meeting Minutes appeared first on Linux Australia.

,

Michael StillWool

Chet bought me this book and demanded I read it, and honestly that was a good call. The book reminds me a bit of  Oryx and Crake, but perhaps that’s unfair given I read that one eight years ago and have probably forgotten some important details. The book is well paced and engaging. Despite being as long as many of Neil Stephenson’s books, I felt it was a much more approachable read than that.

I found the second half of the book a bit harder to read that the first half, because it doesn’t pull many punches in terms of the consequences of people’s actions and is pretty good at building suspense. There were definitely points where I had to pause because I was pretty sure something bad was going to happen to someone I’d grown fond of. That said, it was still a great read. I’ve gone and bought the next two in the series because I’m confident I’m going to want to read them now too.

Wool Book Cover Wool
Hugh Howey
John Joseph Adams/Houghton Mifflin Harcourt
2020
597

The first book in the acclaimed, New York Times best-selling trilogy, Wool is the story of a community living in an underground silo completely unaware of the fate of the outside world. When the silo's sheriff asks to leave the silo, a series of events unravels the very fabric of their fragile lives. In a world where all commodities are precious and running out, truth and hope may be the most rare...and the most needed.

Paul WayperThe Experia, one year on.

On Friday, 1st March, it will be exactly one year since I walked into Zen Motorcycles, signed the paperwork, and got on my brand new Energica Experia electric motorbike. I then rode it back to Canberra, stopping at two places to charge along the way, but that was more in the nature of making sure - it could have done the trip on one better-chosen charging stop.

I got a call yesterday from a guy who had looked at the Experia Bruce has at Zen and was considering buying one. I talked with him for about three quarters of an hour, going through my experience, and to sum it up simply I can just say: this is a fantastic motorbike.

Firstly, it handles exactly like a standard motorbike - it handles almost exactly like my previous Triumph Tiger Sport 1050. But it is so much easier to ride. You twist the throttle and you go. You wind it back and you slow down. If you want to, the bike will happily do nought to 100km/hr in under four seconds. But it will also happily and smoothly glide along in traffic. It says "you name the speed, I'm happy to go". It's not temperamental or impatient; it has no weird points where the throttle suddenly gets an extra boost or where the engine braking suddenly drops off. It is simple to ride.

As an aside, this makes it perfect for lane filtering. On my previous bike this would always be tinged with a frisson of danger - I had to rev it and ease the clutch in with a fair bit of power so I didn't accidentally stall it, but that always took some time. Now, I simply twist the throttle and I am ahead of the traffic - no danger of stalling, no delay in the clutch gripping, just power. It is much safer in that scenario.

I haven't done a lot of touring yet, but I've ridden up to Gosford once and up to Sydney several times. This is where Energica really is ahead of pretty much every other electric motorbike on the market now - they do DC fast charging. And by 'fast charger' here I mean anything from 50KW up; the Energica can only take 25KW maximum anyway :-) But this basically means I have to structure any stops we do around where I can charge up - no more stopping in at the local pub or a cafe on a whim for morning tea. That has to either offer DC fast charging or I'm moving on - the 3KW onboard AC charger means a 22KW AC charger is useless to me. In the hour or two we might stop for lunch I'd only get another 60 - 80 kilometres more range on AC; on DC I would be done in less than an hour.

But OTOH my experience so far is that structuring those breaks around where I can charge up is relatively easy. Most riders will furiously nod when I say that I can't sit in the seat for more than two hours before I really need to stretch the legs and massage the bum :-) So if that break is at a DC charger, no problems. I can stop at Sutton Forest or Pheasant's Nest or even Campbelltown and, in the time it takes for me to go to the toilet and have a bit of a coffee and snack break, the bike is basically charged and ready to go again.

The lesson I've learned, though, is to always give it that bit longer and charge as much as I can up to 80%. It's tempting sometimes when I'm standing around in a car park watching the bike charge to move on and charge up a bit more at the next stop. The problem is that, with chargers still relatively rare and there often only being one or two at each site, a single charger not working can mean another fifty or even a hundred kilometres more riding. That's a quarter to half my range, so I cannot afford to risk that. Charge up and take a good book (and a spare set of headphones).

In the future, of course, when there's a bank of a dozen DC fast chargers in every town, this won't be a problem. Charger anxiety only exists because they are still relatively rare. When charging is easy to find and always available, and there are electric forecourts like the UK is starting to get, charging stops will be easy and will fit in with my riding.

Anyway.

Other advantages of the Experia:

You can get it with a complete set of Givi MonoKey top box and panniers. This means you can buy your own much nicer and more streamlined top box and it fits right on.

Charging at home takes about six hours, so it's easy to do overnight. The Experia comes with an EVSE so you don't need any special charger at home. And really, since the onboard AC charger can only accept 3KW, there's hardly any point in spending much money on a home charger for the Experia.

Minor niggles:

The seat is a bit hard. I'm considering getting the EONE Canyon saddle, although I also just need to try to work out how to get underneath the seat to see if I can fit my existing sheepskin seat cover.

There are a few occasional glitches in the display in certain rare situations. I've mentioned them to Energica, hopefully they'll be addressed.

Overall rating:

5 stars. Already recommending.

,

Lev LafayetteSupercomputing Asia 2024 Summary

Supercomputing Asia 2024 was held in Sydney from the 19th to 23rd of February with over 1,000 attendees, most of whom were from Australia, the United States, Singapore, Japan, Thailand, and Aotearoa New Zealand, with a notable exception from the conference was China given their importance to both supercomputing and Asia, and one speaker noted wryly that "Australia is now apparently part of Asia". The program consisted of plenary sessions in the morning and multiple streams in the afternoon of each day. My attendance was at the IBM Storage Scale User Group for the entirety of the first day, the HPC Leadership Forum on the second, Skills and Training on the third, and the Accelerated Data Analytics and Computing Institute (ADAC) symposium on the fourth. The Storage Scale User Group was useful for a roadmap of their systems (e.g., IBM Storage Scale System 6000, Fusion HCI) and case studies. The Leadership Forum and the ADAC symposium both gave an overview of some of the major systems in the region, which included the two largest systems, Frontier (no 1), Aurora (no 2), along with Fugaku (no 4).

Of note from Fugaku was a Hyperion study on their macroeconomic return on investment for their HPC which was between $63 to $91 per dollar invested, following the 2013 IDN study of HPC in general indicating $44 per dollar invested. The larger figure is explained because of the tighter integration with national objectives in the peak system. Also of note, a concurring with a report written in September 2022 ("Microprocessor Trend Usage in HPC Systems for 2022-2023") was the rise of systems using AMD CPUs and the ubiquity of CPU/GPU heterogeneity. Thailand's Supercomputing Centre of note, rising from a relatively small system to one with 31744 AMD CPUs, 704 A100s, and no 94 in the top500 with 50% of their operating revenue now coming from fee-for-service from "national interest" private industries. In Australia, there is the leadership from NCI in developing the Indo-Pacific Exascale Consortium, modelled after the EuroHPC Joint Undertaking effort.

About 50 people attended the talk I gave at SCAsia 2024 on "HPC Certification Forum & Skill Tree: An Update". There was quite an enthusiastic discussion that followed with several questions about micro-credentials, the potential use of OpenBadge as part of the certification process, and strong interest from several other HPC centres (UWA, CSIRO, NeSI) and Intersect in participating eco-system approach about using the skill tree approach for training content and contributing back. The potential of this sort of collaboration within Australia at the very least will be extremely valuable in improving the HPC on-boarding process for researchers. The talk also dovetailed with a poster presentation, "HPC Training Generates HPC Results", which pointed out longitudinal correlations between the two in terms of training sessions, computer hours, and job completion.

Running topics of note throughout the conference and especially in the plenary sessions (a nice quirk was that the voice of Siri, Karen Jacobsen, was the MC for these sessions), was a focus on AI/machine learning/LLMs and quantum computing. The former topic especially noted the advantages of GPUS which bodes well for our own large GPU partition. Differentiation must be considered between quantum computing and quantum computers; as a recent Spartan-citing paper pointed out quantum algorithms on "classical" computers (e.g., HPC) are preferable to quantum computers which are very much still in the experimental phase. To differentiate, quantum computing is any method to generate quantum effects whereby qubit states can exist in superposition (0,1, both) rather than binary states (0,1). The typical system to do quantum computing, or at least simulate it, is usually HPC. In contrast, a quantum computer uses a system that directly uses a quantum system. For example, GENCI in France uses a photonic computer, LRZ in Germany uses superconducting qubits, PSNC in Poland uses trapped ions, etc.

Opportunities to speak with vendors is always important and in particular longer discussions were held with Dell with their roadmap, DDN on their new filesystem, and Altair's HPCWorks application (which, at the moment, only operates with PBSPro). Notably, many vendors continue to make a pitch in favour of monopolisation under the guise of convenience ("we'll do everything for you") rather than interoperability. Special thanks are given to Xenon Systems for an evening hosted at L'Aqua on Cockle Bay Wharf.

Overall, attendance and participation at the conference were extremely valuable for direct knowledge improvements in storage, useful collaborations with other centres for HPC training, awareness of vendor products, system developments in Asia and US, and developing an understanding of the overall direction of AI/LLM and quantum computing in HPC environments.


Image by Picture by Robert Lageano

,

Simon LyallAudiobooks – January 2024

Valley of Genius: The Uncensored History of Silicon Valley (As Told by the Hackers, Founders, and Freaks Who Made It Boom) by Adam Fisher

Weaves around 500 interviews into stories of People and Companies. Well put together and a great read 4/5

Make It Stick: The Science of Successful Learning by Peter C. Brown

A review of scientifically prove learning techniques concentrating on what really works vs what feels good. Useful 3/5

The Downloaded by Robert J. Sawyer

Two very different groups of people are unfrozen into a post-apocalyptic Earth. A group of Astronauts and a group of Convicted Murderers. Good Sawyer story although a bit on the short side 3/5

How Infrastructure Works: Inside the Systems That Shape Our World by Deb Chachra

A tour of various pieces of Infrastructure that supports our everyday lives. Mostly an introduction but with some strong opinions on funding and sustainability. 3/5

Apollo 1: The Tragedy that put us on the Moon by Ryan S. Walters

Bios of the Astronauts and the US space programme leading up to the accident and various problems that made it inevitable. Good but not extremely detailed. 4/5

The Dancer at the Gai-Moulin by Georges Simenon

The plan by two wayward youths to rob a Belgian nightclub goes awry but how does it connect to a murdered man? Interesting story that avoids Maigret’s point of view. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Francois MarierUsing a GitHub Gist like a git repo

A GitHub gist is backed by a regular git repository, but it's not exposed explicitly via the user interface.

For example, this "secret" gist can be cloned using this command:

git clone https://gist.github.com/fmarier/b652bad2e759675e8650f3d3ee81ab08.git test

Within this test directory, the normal git commands can be used:

touch empty
git add empty
git commit -a -m "Nothing to see here"

A gist can contain multiple files just like normal repositories.

In order to push to this repo, add the following pushurl:

git remote set-url --push origin git@gist.github.com:b652bad2e759675e8650f3d3ee81ab08.git

before pushing using the regular command:

git push

Note that the GitHub history UI will not show you the normal commit details such as commit message and signatures.

If you want to access the latest version of a file contained within this gist, simply access https://gist.githubusercontent.com/fmarier/b652bad2e759675e8650f3d3ee81ab08/raw/readme.md.

Linux AustraliaCouncil Meeting January 31, 2024 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Sae Ra Germaine (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Andrew Pam (Council)
  • Jennifer Cox (Council)
  • Jonathan Woithe (Council)

 

Apologies 

 

Not Present

 

Meeting opened at 20:05 AEDT by Joel  and quorum was achieved.

Minutes taken by Neill and Jonathan

 

2. Log of correspondence

  • Lodge Linux Australia Activity Statement October..December 2023
  • Scrutineers report
  • Fwd: Charity audit (NZPUG)
  • Xero reconciliations
  • Linux Australia – Subcommittee Update [actually Reconnect with Joomla Budget]

Council members should have received an email with more details, but basically the previous council had some questions about the budget that had been submitted. A revised budget has now been submitted for the council’s consideration.

 

The new budget contains no sponsorship and is surprisingly frugal in their estimates of catering expenses. The budget is so small that it is unlikely to be a problem for LA, and Joomla has run previous events that have accumulated a buffer.

 

Once we accept a budget for an event we take responsibility for the results, including the possibility of a loss.

 

  • Fwd: Third Party Public Liability Insurance [for KiwiPyCon]
  • Issue with contact form on LA website – investigations and possible way forward
  • Change to auDA domain name rules – request to make submission under LA banner
  • May I assign LA as registrant of drupal.org.au?
  • pyconau 2024 conference admin
  • Fwd: Reconnect with Joomla Event
  • Fwd: MHW Best Practice Program- Letter of Consent

 

3. Items for discussion

  • Welcome to new council members

Recap: Fortnightly meetings for council. Ideally no more than an hour. Agenda available to everyone before meeting. Please add items to the agenda before the meeting starts.

Every second meeting is a subcommittee meeting where each of our subcommittees are invited to come along and report on how things are going.

Sometimes we discuss things that are not yet public, which will be minuted in the In Camera section. This may be something like a code of conduct issue or possibly information that a conference has shared with us but is not ready to make public.

Discussion and comments from everyone present are welcome, but please keep in mind that we are trying to be done in less than an hour.

We use Zoom for meetings as it is currently the best option available to us, but we would be happy to use a suitable open source option instead.

Similarly we use Google drive for document storage, but would like to move to an open source equivalent.

Jenny and Andrew will need to supply Joel with their preferred google account so that Joel can share the 2024 council directory.

We have a number of tasks to perform at the start of the new year. One requirement is Director IDs.

Details of committee members need to go into Linux Australia’s register, access to which can be requested by any member. This needs to include name, date of birth and address. There will be a document which shows the required details.

All council members will receive email from the council mailing address. It is sometimes best to allow discussion by the council before firing off a response.

We also have a (matrix) chat channel for quick discussions.

Sae Ra is happy to walk people through the various things that the council works on during the year.

Could everyone come with a few dot points for the next meeting addressing why we have all chosen to be on the council and what we would like to achieve this year?

 

4. Items for noting

  • Would it be possible to get the AGM minutes up soonish rather than just before the AGM?

5. Other business

  • Motion that the LA council accepts the proposed budget for the Reconnect with Joomla 2024 event and establishes an event subcommittee for this.

Moved by: Joel

Seconded: Sae Ra

Result: Passed unanimously

  • auDA Response

Jonathan will prepare a draft response in consultation with Kathy Reid. To be published on the LA website and posted to email and/or social media.

There was a general discussion between the council with Sae Ra abstaining due to a conflict of interest.

  • Contact Form on the Linux Australia website

Joel will discuss this with Wil to get advice on the best way to resolve the problem.

  • Drupal.org.au domain name

It seems appropriate to transfer ownership of the domain to Linux Australia. Joel will respond to the email.

  • PyconAU
    PyconAU  is now a steering committee. Late today they provided a draft budget for their next pycon conference and have notified us of their core committee membership.

PyconAU would like to lock in their venue for their proposed dates in November 2024.

  • Motion by Joel: that the LA Council approve the venue contract for PyCon AU 2024, and pay the associated deposit.
    Moved by: Joel
    Seconded: Sae Ra
    Result: Passed unanimously
  • Next steps: budget to be reviewed by LA Council over the next two weeks and be voted on at the next Council meeting.
  • MHW Best Practice Program – Letter of Consent requested by the auditor. He would like to pass Linux Australia’s details on to CPA Australia as part of the Best Practice Program.
    • Motion: that LA approve Russell to sign the informed written consent for the MHW participation in the CPA Australia Best Practice Program assessment..
      Moved by Russell

Seconded: Neill
Result: Passed unanimously

  • Meeting Times – are we happy to continue with Wednesday evenings at 2000 AEDT

After a brief discussion there was no significant disagreement about running the meetings at the current time. When daylight savings finishes we will move to 1930 AEST.

Meeting closed at 21:08

Next meeting is scheduled for 2024-02-14

The post Council Meeting January 31, 2024 – Minutes appeared first on Linux Australia.

,

Simon LyallOzmoot 2024 – Day 3 – Afternoon

Arda Measured with Jackson Mitchell-Bolton

  • The Geocosmology of the Tolkien Legendarium
  • General – Cosmology of Arda
  • Overview
  • 1st and Much of 2nd age
    • Flat and Enclosed
    • Late 2nd Age
    • Changes to being a sphere
    • Valanor separated from the rest of the world
    • What the earth is now
  • What did Tolkien show us about the World?
  • 5 maps in Ambarkanta
  • Map 5 – Need to find out the scale
    • Start with main Lord of Rings map which has scale
    • Astronomy has a distance ladder
  • The First Map of the Lord of the rings
    • Extends much further especially to the North
    • Has some differences with Final Map though
    • Fitted with Published map over the First Map
    • Created Smallest, Largest and Best matches
    • 3 possible lengths of 300 miles. Length of “red line” is 400 +-7 miles.
  • The 2nd Silmirilion Map
    • Looking at features around the Blue Mountains
    • Able to get a size of Balariand
  • But the Gulf shows up on map 5 so can go direct
    • So Using map 5 the equator is 13, 15 or 17 times the reference distance.
    • Girdle of arda is 13.3 +2.5/-2.2 times the reference length
    • Length of Girdle is 6120 -+1000 miles
    • about 0.77x the size of the earth ( +0.14/-0.11 )
  • Compare to the Atlas of middle Earth.
    • 6400 falls in his estimates
  • Ambarkana Diagram III – The worth made round
    • The straight path is a little bit wobbly
    • But actual distance is only about 1% different
    • Estimated size of Ardar is about half of present day earth
  • This would imply greater there have been significant changes to the world some time between the books and the present day
Diagram III
Map 5


Oaths and Promises: A Path Through Darkness and Uncertainty with Stephen Vrettos

  • Oaths are used to obtain certainty
  • The Oath of Feanor
    • He cannot offer them safety in middle earth
    • Rallys the greater path to follow him
  • The Oath of Eorl, The Oath of Ciriom
    • Came to rescue of Gondor
    • Swore friendship with each other
    • Came to each other’s aid over the years
  • Smeagol’s promise
    • Serve the Master of the Precious
  • Why are Oaths so effective?
    • Certainty on how people will act
    • Certainty how other kingdoms will act
  • Oaths have the ability to propel their own fulfillment
    • Language in the Silmirilion – “The Oath drives them”
  • Iluvatar will enforce the Oaths are fulfilled
    • None can see how to get out of Oath of Feanor
    • Isildur’s Curse. How did a mortal man have the ability to bind the Oathbreakers beyond their deaths?
    • Seems unlikely he could do this himself even if he had magic. Even Valar could not make men immortal
    • Power of curse probably came from Iluvatar
    • Smeagol’s Oath – Frodo has no power, enforcement must have come from Iluvatar
    • Lots of discussion on this, hard to summarise

Share

,

Francois MarierUpgrading from Debian 11 bullseye to 12 bookworm

Over the last few months, I upgraded my Debian machines from bullseye to bookworm. The process was uneventful (besides the asterisk issue described below), but I ended up reconfiguring several things afterwards in order to modernize my upgraded machines.

Logcheck

I noticed in this release that the transition to journald is essentially complete. This means that rsyslog is no longer needed on most of my systems:

apt purge rsyslog

Once that was done, I was able to comment out the following lines in /etc/logcheck/logcheck.logfiles.d/syslog.logfiles:

#/var/log/syslog
#/var/log/auth.log

I did have to adjust some of my custom logcheck rules, particularly the ones that deal with kernel messages:

--- a/logcheck/ignore.d.server/local-kernel
+++ b/logcheck/ignore.d.server/local-kernel
@@ -1,1 +1,1 @@
-^\w{3} [ :[:digit:]]{11} [._[:alnum:]-]+ kernel: \[[0-9. ]+]\ IN=eno1 OUT= MAC=[0-9a-f:]+ SRC=[0-9a-f.:]+
+^\w{3} [ :[:digit:]]{11} [._[:alnum:]-]+ kernel: (\[[0-9. ]+]\ )?IN=eno1 OUT= MAC=[0-9a-f:]+ SRC=[0-9a-f.:]+

Then I moved local entries from /etc/logcheck/logcheck.logfiles to /etc/logcheck/logcheck.logfiles.d/local.logfiles (/var/log/syslog and /var/log/auth.log are enabled by default when needed) and removed some files that are no longer used:

rm /var/log/mail.err*
rm /var/log/mail.warn*
rm /var/log/mail.info*

Finally, I had to fix any unescaped | characters in my local rules. For example error == NULL || \*error == NULL must now be written as error == NULL \|\| \*error == NULL.

Networking

After the upgrade, I got a notice that the isc-dhcp-client is now deprecated and so I removed if from my system:

apt purge isc-dhcp-client

This however meant that I need to ensure that my network configuration software does not depend on the now-deprecated DHCP client.

On my laptop, I was already using NetworkManager for my main network interfaces and that has built-in DHCP support.

Migration to systemd-networkd

On my backup server, I took this opportunity to switch from ifupdown to systemd-networkd by removing ifupdown:

apt purge ifupdown
rm /etc/network/interfaces

putting the following in /etc/systemd/network/20-wired.network:

[Match]
Name=eno1

[Network]
DHCP=yes
MulticastDNS=yes

and then enabling/starting systemd-networkd:

systemctl enable systemd-networkd
systemctl start systemd-networkd

I also needed to install polkit:

apt install --no-install-recommends policykit-1

in order to allow systemd-networkd to set the hostname.

In order to start my firewall automatically as interfaces are brought up, I wrote a dispatcher script to apply my existing iptables rules.

Migration to predictacle network interface names

On my Linode server, I did the same as on the backup server, but I put the following in /etc/systemd/network/20-wired.network since it has a static IPv6 allocation:

[Match]
Name=enp0s4

[Network]
DHCP=yes
Address=2600:3c01::xxxx:xxxx:xxxx:939f/64
Gateway=fe80::1

and switched to predictable network interface names by deleting these two files:

  • /etc/systemd/network/50-virtio-kernel-names.link
  • /etc/systemd/network/99-default.link

and then changing eth0 to enp0s4 in:

  • /etc/network/iptables.up.rules
  • /etc/network/ip6tables.up.rules
  • /etc/rc.local (for OpenVPN)
  • /etc/logcheck/ignored.d.*/*

Then I regenerated all initramfs:

update-initramfs -u -k all

and rebooted the virtual machine.

Giving systemd-resolved control of /etc/resolv.conf

After reading this history of DNS resolution on Linux, I decided to modernize my resolv.conf setup and let systemd-resolved handle /etc/resolv.conf.

I installed the package:

apt install systemd-resolved

and then removed no-longer-needed packages:

apt purge openresolv resolvconf avahi-daemon

I also disabled support for Link-Local Multicast Name Resolution (LLMNR) after reading this person's reasoning by putting the following in /etc/systemd/resolved.conf.d/llmnr.conf:

[Resolve]
LLMNR=no

I verified that mDNS is enabled and LLMNR is disabled:

$ resolvectl mdns
Global: yes
Link 2 (enp0s25): yes
Link 3 (wlp3s0): yes
$ resolvectl llmnr
Global: no
Link 2 (enp0s25): no
Link 3 (wlp3s0): no

Note that if you want auto-discovery of local printers using CUPS, you need to keep avahi-daemon and ensure that systemd-resolved does not conflict with it.

DNS resolution problems with ifupdown

Also, if you haven't migrated to systemd-networkd yet and are still using ifupdown with a static IP address, you will likely run into DNS problems which can be fixed using the following patch to /etc/network/if-up.d/resolved:

@@ -43,11 +43,11 @@ if systemctl is-enabled systemd-resolved > /dev/null 2>&1; then
     fi
     if  [ -n "$NEW_DNS" ]; then
         cat <<EOF >"$mystatedir/ifupdown-${ADDRFAM}-$interface"
-"$DNS"="$NEW_DNS"
+$DNS="$NEW_DNS"
 EOF
         if  [ -n "$NEW_DOMAINS" ]; then
             cat <<EOF >>"$mystatedir/ifupdown-${ADDRFAM}-$interface"
-"$DOMAINS"="$NEW_DOMAINS"
+$DOMAINS="$NEW_DOMAINS"
 EOF
         fi
     fi
@@ -66,7 +66,7 @@ EOF
     # ignore errors due to nonexistent file
     md5sum "$mystatedir/isc-dhcp-v4-$interface" "$mystatedir/isc-dhcp-v6-$interface" "$mystatedir/ifupdown-inet-$interface" "$mystatedir/ifupdown-inet6-$interface" > "$newstate" 2> /dev/null || true
     if ! cmp --silent "$oldstate" "$newstate" 2>/dev/null; then
-        DNS DNS6 DOMAINS DOMAINS6 DEFAULT_ROUTE
+        unset DNS DNS6 DOMAINS DOMAINS6 DEFAULT_ROUTE
         # v4 first
         if [ -e "$mystatedir/isc-dhcp-v4-$interface" ]; then
             . "$mystatedir/isc-dhcp-v4-$interface"

and make sure you have nameservers setup in your static config, for example one of my servers' /etc/network/interfaces looks like this:

iface enp4s0 inet static
     address 192.168.1.2
     netmask 255.255.255.0
     gateway 192.168.1.1
     dns-nameservers 149.112.121.20
     dns-nameservers 149.112.122.20
     pre-up iptables-restore /etc/network/iptables.up.rules

Dynamic DNS

I replaced ddclient with inadyn since it doesn't work with no-ip.com anymore, using the configuration I described in an old blog post.

chkrootkit

I moved my customizations in /etc/chkrootkit.conf to /etc/chkrootkit/chkrootkit.conf after seeing this message in my logs:

WARNING: /etc/chkrootkit.conf is deprecated. Please put your settings in /etc/chkrootkit/chkrootkit.conf instead: /etc/chkrootkit.conf will be ignored in a future release and should be deleted.

ssh

As mentioned in Debian bug#1018106, to silence the following warnings:

sshd[6283]: pam_env(sshd:session): deprecated reading of user environment enabled

I changed the following in /etc/pam.d/sshd:

--- a/pam.d/sshd
+++ b/pam.d/sshd
@@ -44,7 +44,7 @@ session    required     pam_limits.so
 session    required     pam_env.so # [1]
 # In Debian 4.0 (etch), locale-related environment variables were moved to
 # /etc/default/locale, so read that as well.
-session    required     pam_env.so user_readenv=1 envfile=/etc/default/locale
+session    required     pam_env.so envfile=/etc/default/locale

 # SELinux needs to intervene at login time to ensure that the process starts
 # in the proper default security context.  Only sessions which are intended

I also made the following changes to /etc/ssh/sshd_config.d/local.conf based on the advice of ssh-audit 2.9.0:

-KexAlgorithms curve25519-sha256@libssh.org,curve25519-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha256
+KexAlgorithms curve25519-sha256@libssh.org,curve25519-sha256,sntrup761x25519-sha512@openssh.com,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512

Unwanted power management

I ran into a problem with one of my servers where it would suspend itself after a certain amount of time. This was due to default GDM behaviour it turns out and while I could tell gdm not to sleep on inactivity, I instead put the following in /etc/systemd/sleep.conf.d/nosuspend.conf to fully disable systemd-based suspend or hibernate:

[Sleep]
AllowSuspend=no
AllowHibernation=no
AllowSuspendThenHibernate=no
AllowHybridSleep=no

Asterisk has been removed from Debian

The only major problem I ran into while upgrading to bookworm is that I discovered that Asterisk has been removed from stable and testing. For some reason, this was not mentioned in the release notes and I have not yet found a good solution.

If you upgrade to bookworm, be warned that the bullseye packages will remain installed (and will work fine in my experience) unless you "clean them up" with apt purge '~o' accidentally and then you'll have to fetch these old debs manually.

Francois MarierProper Multicast DNS Handling with NetworkManager and systemd-resolved

Using NetworkManager and systemd-resolved together in Debian bookworm does not work out of the box. The first sign of trouble was these constant messages in my logs:

avahi-daemon[pid]: Host name conflict, retrying with hostname-2

Then I realized that CUPS printer discovery didn't work: my network printer could not be found. Since this discovery now relies on Multicast DNS, it would make sense that both problems are related to an incompatibility between NetworkManager and Avahi.

What didn't work

The first attempt I made at fixing this was to look for known bugs in Avahi. Neither of the work-arounds I found worked:

What worked

The real problem turned out to be the fact that NetworkManager turns on full mDNS support in systemd-resolved which conflicts with the mDNS support in avahi-daemon.

You can see this in the output of resolvectl status:

Global
       Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp6s0)
    Current Scopes: DNS mDNS/IPv4 mDNS/IPv6
         Protocols: +DefaultRoute -LLMNR +mDNS -DNSOverTLS
                    DNSSEC=no/unsupported
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1
        DNS Domain: lan

which includes +mDNS for the main network adapter.

I initially thought that I could just uninstall avahi-daemon and rely on the systemd-resolved mDNS stack, but it's not actually compatible with CUPS.

The solution was to tell NetworkManager to set mDNS to resolve-only mode in systemd-resolved by adding the following to /etc/NetworkManager/conf.d/mdns.conf:

[connection]
connection.mdns=1

leaving /etc/avahi/avahi-daemon.conf to the default Debian configuration.

Verifying the configuration

After rebooting, resolvectl status now shows the following:

Global
       Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp6s0)
    Current Scopes: DNS mDNS/IPv4 mDNS/IPv6
         Protocols: +DefaultRoute -LLMNR mDNS=resolve -DNSOverTLS
                    DNSSEC=no/unsupported
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1
        DNS Domain: lan

Avahi finally sees my printer (called hp in the output below):

$ avahi-browse -at | grep Printer
+ enp6s0 IPv6 hp @ myprintserver   Secure Internet Printer local
+ enp6s0 IPv4 hp @ myprintserver   Secure Internet Printer local
+ enp6s0 IPv6 hp @ myprintserver   Internet Printer        local
+ enp6s0 IPv4 hp @ myprintserver   Internet Printer        local
+ enp6s0 IPv6 hp @ myprintserver   UNIX Printer            local
+ enp6s0 IPv4 hp @ myprintserver   UNIX Printer            local

and so does CUPS:

$ sudo lpinfo --include-schemes dnssd -v
network dnssd://myprintserver%20%40%20hp._ipp._tcp.local/cups?uuid=d46942a2-b730-11ee-b05c-a75251a34287

Firewall rules

Since printer discovery in CUPS relies on mDNS, another thing to double-check is that the correct ports are open on the firewall.

This is what I have in /etc/network/iptables.up.rules:

# Allow mDNS for local service discovery
-A INPUT -d 100.64.0.0/10 -p udp --dport 5353 -j ACCEPT
-A INPUT -d 192.168.1.0/24 -p udp --dport 5353 -j ACCEPT

and in etc/network/ip6tables.up.rules:

# Allow mDNS for local service discovery
-A INPUT -d ff02::/16 -p udp --dport 5353 -j ACCEPT

,

Linux AustraliaCouncil Meeting January 17, 2024 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine  (Council)
  • Jonathan Woithe (Council)
  • Marcus Herstik (Council)

 

Apologies 

 

Not Present

 

Meeting opened at 20:07 AEDT by Joel  and quorum was achieved.

Minutes taken by Neill

 

2. Log of correspondence

  • Application to form a Kiwi PyCon 2024 subcommittee
  • RISC-V hardware for software development – grant application decision
  • Annual Report now out on socials
  • Linux Australia Sponsorship of NZ Python User Group
  • Linux Australia – Subcommittee Update  (Drupal)
  • Lodge Linux Australia Activity Statement October..December 2023
  • Email issues

Forwarded to admin team for investigation

  • WRT Richard Shea/NZPUG LA Membership application

Application was received after the election opened. Put on hold until after voting closed.

  • Sponsorship agreement template, Code of Conduct, anything else?
  • Fwd: Charity audit
  • Details following induction (KiwiPyCon)
  • Many AGM registrations 
  • Several redbubble sales

 

3. Items for discussion

  • Joomla Event – Reconnect with Joomla 2024
    • Friday 15th to Sunday 17th March

Relatively small event, not many attendees (20 – 60), low budget. Good to see them active again.

The catering costs in the budget seem very optimistic (perhaps because someone is sponsoring in kind?), we will ask for clarification.

 

  • AGM
  • Scheduled for this Saturday (20 Jan 2024). We had to make a correction to the time in Zoom because Zoom reported the wrong time.

4. Items for noting

  • Council election is now complete. The OCM position declaration will be finalised by the returning officer.

5. Other business

  • NZPUG Audit

The cost of providing an audit of NZPUG is estimated at being at least $3,000 and possibly more than $10,000. These costs seem very high in proportion to the benefit that the audit would provide.

Neither organisation has any realistic prospect of recovering funds as a result of the audit. In particular there is no benefit to LA. NZPUG might be able to recover some money.

 

MOTION: That we approve a grant of a maximum $3,000 to NZPUG to get an audit done.

MOVED: Russell Stuart

SECONDED: Wil Brown

RESULT: Motion Failed

 

  • Guarantor for KiwiPycon

The venue for KiwiPyCon is asking for a guarantor for the event. Hopefully the subcommittee will find a way around this. Russell will ask them to explore other options (references, insurance)

 

  • Everything Open Gladstone

Second keynote announced. Ticket sales are slow, but given they only opened the week before Christmas this is not surprising.

The post Council Meeting January 17, 2024 – Minutes appeared first on Linux Australia.

,

Francois MarierFiltering your own spam using SpamAssassin

I know that people rave about GMail's spam filtering, but it didn't work for me: I was seeing too many false positives. I personally prefer to see some false negatives (i.e. letting some spam through), but to reduce false positives as much as possible (and ideally have a way to tune this).

Here's the local SpamAssassin setup I have put together over many years. In addition to the parts I describe here, I also turn off greylisting on my email provider (KolabNow) because I don't want to have to wait for up to 10 minutes for a "2FA" email to go through.

This setup assumes that you download all of your emails to your local machine. I use fetchmail for this, though similar tools should work too.

Three tiers of emails

The main reason my setup works for me, despite my receiving hundreds of spam messages every day, is that I split incoming emails into three tiers via procmail:

  1. not spam: delivered to inbox
  2. likely spam: quarantined in a soft_spam/ folder
  3. definitely spam: silently deleted

I only ever have to review the likely spam tier for false positives, which is on the order of 10-30 spam emails a day. I never even see the the hundreds that are silently deleted due to a very high score.

This is implemented based on a threshold in my .procmailrc:

# Use spamassassin to check for spam
:0fw: .spamassassin.lock
| /usr/bin/spamassassin

# Throw away messages with a score of > 12.0
:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*
/dev/null

:0:
* ^X-Spam-Status: Yes
$HOME/Mail/soft_spam/

# Deliver all other messages
:0:
${DEFAULT}

I also use the following ~/.muttrc configuration to easily report false negatives/positives and examine my likely spam folder via a shortcut in mutt:

unignore X-Spam-Level
unignore X-Spam-Status

macro index S "c=soft_spam/\n" "Switch to soft_spam"

# Tell mutt about SpamAssassin headers so that I can sort by spam score
spam "X-Spam-Status: (Yes|No), (hits|score)=(-?[0-9]+\.[0-9])" "%3"
folder-hook =soft_spam 'push ol'
folder-hook =spam 'push ou'

# <Esc>d = de-register as non-spam, register as spam, move to spam folder.
macro index \ed "<enter-command>unset wait_key\n<pipe-entry>spamassassin -r\n<enter-command>set wait_key\n<save-message>=spam\n" "report the message as spam"

# <Esc>u = unregister as spam, register as non-spam, move to inbox folder.
macro index \eu "<enter-command>unset wait_key\n<pipe-entry>spamassassin -k\n<enter-command>set wait_key\n<save-message>=inbox\n" "correct the false positive (this is not spam)"

Custom SpamAssassin rules

In addition to the default ruleset that comes with SpamAssassin, I've also accrued a number of custom rules over the years.

The first set comes from the (now defunct) SpamAssassin Rules Emporium. The second set is the one that backs bugs.debian.org and lists.debian.org. Note this second one includes archived copies of some of the SARE rules and so I only use some of the rules in the common/ directory.

Finally, I wrote a few custom rules of my own based on specific kinds of emails I have seen slip through the cracks. I haven't written any of those in a long time and I suspect some of my rules are now obsolete. You may want to do your own testing before you copy these outright.

In addition to rules to match more spam, I've also written a ruleset to remove false positives in French emails coming from many of the above custom rules. I also wrote a rule to get a bonus to any email that comes with a patch:

describe FM_PATCH   Includes a patch
body FM_PATCH   /\bdiff -pruN\b/
score FM_PATCH  -1.0

since it's not very common in spam emails :)

SpamAssassin settings

When it comes to my system-wide SpamAssassin configuration in /etc/spamassassin/, I enable the following plugins:

loadplugin Mail::SpamAssassin::Plugin::AntiVirus
loadplugin Mail::SpamAssassin::Plugin::AskDNS
loadplugin Mail::SpamAssassin::Plugin::ASN
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
loadplugin Mail::SpamAssassin::Plugin::Bayes
loadplugin Mail::SpamAssassin::Plugin::BodyEval
loadplugin Mail::SpamAssassin::Plugin::Check
loadplugin Mail::SpamAssassin::Plugin::DKIM
loadplugin Mail::SpamAssassin::Plugin::DNSEval
loadplugin Mail::SpamAssassin::Plugin::FreeMail
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
loadplugin Mail::SpamAssassin::Plugin::HashBL
loadplugin Mail::SpamAssassin::Plugin::HeaderEval
loadplugin Mail::SpamAssassin::Plugin::HTMLEval
loadplugin Mail::SpamAssassin::Plugin::HTTPSMismatch
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
loadplugin Mail::SpamAssassin::Plugin::MIMEEval
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
loadplugin Mail::SpamAssassin::Plugin::OLEVBMacro
loadplugin Mail::SpamAssassin::Plugin::PDFInfo
loadplugin Mail::SpamAssassin::Plugin::Phishing
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::RelayEval
loadplugin Mail::SpamAssassin::Plugin::ReplaceTags
loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
loadplugin Mail::SpamAssassin::Plugin::SpamCop
loadplugin Mail::SpamAssassin::Plugin::TextCat
loadplugin Mail::SpamAssassin::Plugin::TxRep
loadplugin Mail::SpamAssassin::Plugin::URIDetail
loadplugin Mail::SpamAssassin::Plugin::URIEval
loadplugin Mail::SpamAssassin::Plugin::VBounce
loadplugin Mail::SpamAssassin::Plugin::WelcomeListSubject
loadplugin Mail::SpamAssassin::Plugin::WLBLEval

Some of these require extra helper packages or Perl libraries to be installed. See the comments in the relevant *.pre files or use this command to install everything:

apt install spamassassin  pyzor razor libencode-detect-perl liblog-log4perl-perl libgeoip-dev libmail-dkim-perl libarchive-zip-perl libio-string-perl libmail-dmarc-perl fuzzyocr

My ~/.spamassassin/user_prefs file contains the following configuration:

required_hits   5
ok_locales en fr

# Bayes options
score BAYES_00 -4.0
score BAYES_40 -0.5
score BAYES_60 1.0
score BAYES_80 2.7
score BAYES_95 4.0
score BAYES_99 6.0
bayes_auto_learn 1
bayes_ignore_header X-Miltered
bayes_ignore_header X-MIME-Autoconverted
bayes_ignore_header X-Evolution
bayes_ignore_header X-Virus-Scanned
bayes_ignore_header X-Forwarded-For
bayes_ignore_header X-Forwarded-By
bayes_ignore_header X-Scanned-By
bayes_ignore_header X-Spam-Level
bayes_ignore_header X-Spam-Status

as well as manual score reductions due to false positives, and manual score increases to help push certain types of spam emails over the 12.0 definitely spam threshold.

Finally, I have the FuzzyOCR package installed since it has occasionally flagged some spam that other tools had missed. It is a little resource intensive though and so you may want to avoid this one if you are filtering spam for other people.

As always, feel free to leave a comment if you do something else that works well and that's not included in my setup. This is a work-in-progress.

,

Stewart SmithUsing llvm-mca for predicting CPU cycle impact of code changes

Way back in the distant past, when the Apple ][ and the Commodore 64 were king, you could read the manual for a microprocessor and see how many CPU cycles each instruction took, and then do the math as to how long a sequence of instructions would take to execute. This cycle counting was used pretty effectively to do really neat things such as how you’d get anything on the screen from an Atari 2600. Modern CPUs are… complex. They can do several things at once, in a different order than what you wrote them in, and have an interesting arrangement of shared resources to allocate.

So, unlike with simpler hardware, if you have a sequence of instructions for a modern processor, it’s going to be pretty hard to work out how many cycles that could take by hand, and it’s going to differ for each micro-architecture available for the instruction set.

When designing a microprocessor, simulating what a series of existing instructions will take to execute compared to the previous generation of microprocessor is pretty important. The aim should be for it to take less time or energy or some other metric that means your new processor is better than the old one. It can be okay if processor generation to generation some sequence of instructions take more cycles, if your cycles are more frequent, or power efficient, or other positive metric you’re designing for.

Programmers may want this simulation too, as some code paths get rather performance critical for certain applications. Open Source tools for this aren’t as prolific as I’d like, but there is llvm-mca which I (relatively) recently learned about.

llvm-mca is a performance analysis tool that uses information available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific CPU.

the llvm-mca docs

So, when looking at an issue in the IPv6 address and connection hashing code in Linux last year, and being quite conscious of modern systems dealing with a LOT of network packets, and thus this can be quite CPU usage sensitive, I wanted to make sure that my suggested changes weren’t going to have a large impact on performance – across the variety of CPU generations in use.

There’s two ways to do this: run everything, throw a lot of packets at something, and measure it. That can be a long dev cycle, and sometimes just annoying to get going. It can be a lot quicker to simulate the small section of code in question and do some analysis of it before going through the trouble of spinning up multiple test environments to prove it in the real world.

So, enter llvm-mca and the ability to try and quickly evaluate possible changes before testing them. Seeing as the code in question was nicely self contained, I could easily get this to a point where I could easily get gcc (or llvm) to spit out assembler for it separately from the kernel tree. My preference was for gcc as that’s what most distros end up compiling Linux with, including the Linux distribution that’s my day job (Amazon Linux).

In order to share the results of the experiments as part of the discussion on where the code changes should end up, I published the code and results in a github project as things got way too large to throw on a mailing list post and retain sanity.

I used a container so that I could easily run it in a repeatable isolated environment, as well as have others reproduce my results if needed. Different compiler versions and optimization levels will very much produce different sequences of instructions, and thus possibly quite different results. This delta in compiler optimization levels is partially why the numbers don’t quite match on some of the mailing list messages, although the delta of the various options was all the same. The other reason is learning how to better use llvm-mca to isolate down the exact sequence of instructions I was caring about (and not including things like the guesswork that llvm-mca has to do for branches).

One thing I learned along the way is how to better use llvm-mca to get the results that I was looking for. One trick is to very much avoid branches, as that’s going to be near complete guesswork as there’s not a simulation of the branch predictor (at least in the version I was using.

The big thing I wanted to prove: is doing the extra work having a small or large impact on number of elapsed cycles. The answer was that doing a bunch of extra “work” was essentially near free. The CPU core could execute enough things in parallel that the incremental cost of doing extra work just… wasn’t relevant.

This helped getting a patch deployed without impact to performance, as well as get a patch upstream, fixing an issue that was partially fixed 10 years prior, and had existed since day 1 of the Linux IPv6 code.

Naturally, this wasn’t a solo effort, and that’s one of the joys of working with a bunch of smart people – both at the same company I work for, and in the broader open source community. It’s always humbling when you’re looking at code outside your usual area of expertise that was written (and then modified) by Really Smart People, and you’re then trying to fix a problem in it, while trying to learn all the implications of changing that bit of code.

Anyway, check out llvm-mca for your next adventure into premature optimization, as if you’re going to get started with evil, you may as well start with what’s at the root of all of it.

,

Tim Riley2023 in review

2023: back in action, in more ways than one.

Five conferences

After a three year break, I spoke at five conferences in the last twelve months:

  • RubyConf Thailand — a wonderful second edition of this event, the chance to share Hanami 2 on the stage for the first time, and first-time in person hangs with another Buildkiter!
  • RubyConf AU — a joyous reunion for my home community. Good to be back, and positive signs for the future. Met many Buildkiters in person here!
  • Brighton Ruby — I’ve admired Andy and this event for the longest time. To attend and contribute a talk was a dream come true.
  • RubyConf — my first time at the US-based RubyConf, and a brilliant time all around: made new friends, had many great conversations, and got to work with some first-time Hanami contributors during the hack day!
  • RubyConf Taiwan — What a revelation! A whole thriving Ruby community I had no idea about, and a conference was chock-a-block with interesting talks. Being able to keynote opposite Matz was a real honour.

I’d commend any of these conferences to you, and I hope to get back to each of them in the future.

Being able to talk at so many events was definitely not my plan at the outset of the year! But one thing led to another, and I just rolled with it. It was a pleasure to share Hanami with people in so many places.

One thing I appreciated over this period was the chance to refine my method of introducing Hanami, and moreover, how I deliver conference talks in general. I’m very happy with how this ended up. I ended the year giving two very different presentations, each in their own way imbuing the audience (I hope!) with a sense of both whimsy and possibility: one involved song and dance, and the other, a surprise costume reveal!

Delivering “Livin’ la Vida Hanami” at RubyConf 2023

After “Quest of the Rubyist” at RubyConf Taiwan 2023

A near-release of Hanami

Back at the end of 2021, we made a big push to get Hanami 2.0 released before I made it to RubyConf Thailand, and we succeeded!

I then spent all of 2022 working towards Hanami 2.1, which would introduce our view layer and a completely new approach to handling assets. As RubyConf approached in November, we attempted to do the same thing, to use the conference as motivation and make a big push to get the release out. I spent 2+ months working every night and weekend towards this, and we got so close, but didn’t quite make it.

Just two days before my talk, while I was already at the conference, we discovered what turned out to a release-blocking issue with our front end assets compilation. After a scramble at various stopgap fixes, we decided nothing would quite cut the mustard, and deferred the release. This was disappointing, but was the best choice for the project. After a break over Christmas, I’m now ready to take a final pass at this and make the right choices for the future.

Given this, I hope for 2024 to be a big two-release year for Hanami, with 2.1 happening next month, followed by 2.2 whenever it’s ready.

Family trips!

Once the Brighton Ruby opportunity came up, we decided to make the most of it and turn it into a family trip to Europe, our first overseas trip in the post-2020 era. It was an excellent time. We got to spend a bonus summer across Brighton, London, Paris, Amsterdam, Rotterdam and Brussels. The kids travelled very well, and we all can’t wait to do it again.

We also took a few nice road trips over the year, visiting Orange, Wagga Wagga, and ending the year with a relaxing week over Christmas in a house at Bawley Point.

Work at Buildkite

I had a good year at Buildkite. I spent the year in the same team I joined in 2022. Early in the year we hired some folks and brought the team to its full complement. Everyone is lovely, and we put in some great work towards four different quarterly releases.

I reached my first “bikkiversary” milestone in July, I joined an active on-call roster for the first time, I helped establish some clear direction for our future in front end development, and ended the year acting as an engineering manager, just to help keep things steady while my existing manager took some leave. Every one of these has been a pleasure. I’m continually humbled by the talent that surrounds me at Buildkite, and I’m looking forward to another year of learning.

Best of all, we gathered for our BIPOP (Buildkite In-Person On-site Party; yes, we love acronyms), our whole-company event in Cairns. This was a blast.

Reading

I read 22 books. In order:

  • Distress, Greg Egan
  • Sea of Tranquility, Emily St. John Mandel
  • The Water Knife, Paolo Bacigalupi
  • Shards of Earth, Adrian Tchaikovsky
  • Eyes of the Void, Adrian Tchaikovsky
  • The Thousand Earths, Stephen Baxter
  • Eversion, Alastair Reynolds
  • Lords of Uncreation, Adrian Tchaikovsky
  • Fractal Noise, Christopher Paolini
  • Me, Ricky Martin
  • Ancillary Justice, Ann Leckie
  • Ancillary Sword, Ann Leckie
  • Ancillary Mercy, Ann Leckie
  • Provenance, Ann Leckie
  • Translation State, Ann Leckie
  • Creation Node, Stephen Baxter
  • The Spare Man, Mary Robinette Kowal
  • Some Desperate Glory, Emily Tesh
  • Fugitive Telemetry, Martha Wells
  • System Collapse, Martha Wells
  • Fourth Wing, Rebecca Yarros
  • Iron Flame, Rebecca Yarros

I enjoyed them all! Adrian Tchaikovsky’s Final Architecture trilogy was thrilling, and discovering Ann Leckie’s Imperial Radch world was a true wonder: I’m so glad I got to binge all five stories together. It was a joy to revisit Murderbot with the most recent two books, and I don’t think I’ve read anything faster than I did Fourth Wing while hanging around a coast house at Christmas.

This year I also figured out my ideal e-reading situation. It’s the iPad mini, and Apple’s Books app in particular. Nothing beats its responsiveness and ease of use. And the iPad mini helps with the occasional tech book I read too. And thanks to Calibre and the Obok plugin, I can also buy books from Kobo and get them into Apple Books in short order.

Assorted things

  • We ended the year by giving the kids (9 & 7) each their own bedrooms. My desk/office is once again located next to my bed. Given I did this for three years already in our previous apartment, I expect this to work out well enough.
  • I started using the Retro app to share photos with a small group of friends. It’s brilliant! It has me sharing photos much more regularly, and helped me end the year with a nice collection of memories.
  • Thanks to several long plane rides, I did a decent amount of movie watching. Notables: No Time to Die (sob), Shotgun Wedding (fun), Jules (surprising), and Indy 5 (a triumph). Best soundtracks? Tetris, The Thomas Crown Affair.
  • I continue to pay attention to the little computer on my wrist telling me to exercise, though this unfortunately wavered towards the end of the year during my crunch on Hanami and conference talks.

,

Linux AustraliaCouncil Meeting January 03, 2024 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine  (Council)
  • Jonathan Woithe (Council)

 

Apologies 

 

Not Present

  • Marcus Herstik (Council)

 

Meeting opened at 20:02 AEDT by Joel  and quorum was achieved.

Minutes taken by Neill

 

2. Log of correspondence

  • RISC-V hardware for software development – grant application decision
  • Payment Approval Request – DrupalSouth Community Day
  • A couple of website and social updates in preparation for AGM / elections
  • Open Source AI Definition: Version 0.0.3 released, planning a global outreach
  • Hack the Triangle Workshops – grant application decision
  • Application to form a Kiwi PyCon 2024 subcommittee
  • BAS Time
  • [Linux-aus] contest proposal

 

3. Items for discussion

  • MOTION: Linux Australia accepts the provisional budget for 2024.
    • MOVED BY: Russell Stuart
    • SECONDED: Neill Cox
    • OUTCOME: Motion passed unanimously
  • Kiwi PyCon subcommittee for 2024
    • MOTION: Linux Australia accepts Kiwi PyCon XIII (2024) as an event subcommittee, per the proposal provided.
    • MOVED BY: Joel
    • SECONDED: Jonathan
    • OUTCOME: Motion passed unanimously 
  • One motion passed in camera, to be released later.
  • Programming Contest Proposal – Linux Aus mailing list
    • The council is generally supportive of this, but it is something for the next council to decide the details of, given a proposal is still forthcoming. It has been good to see a discussion of this sort on the mailing list.

4. Items for noting

  • Auditors report has been signed by the auditor
  • P&L by conference for 2022/2023
  • Annual Report

5. Other business

None.

The post Council Meeting January 03, 2024 – Minutes appeared first on Linux Australia.

Linux AustraliaMinutes of Linux Australia – Annual General Meeting 2023

Saturday 21 January 2023, 2:35pm AEDT (UTC+11)

Video conference via Zoom.

Attendance record is available upon request.

0. Apologies

Josh Bartlett

Craige McWhirter

Hugh Blemmings

Russell Stuart – arrived 2:50pm, just in time to give his report.

 

Proxies:

Miles Goodhew for Hugh Blemmings

 

1. President’s welcome

MR JOEL ADDISON, President

2. Approval of the minutes from the previous Annual General Meeting – 2022

MOTION by MR JOEL ADDISON that the minutes of the Annual General Meeting 2022 of Linux Australia be accepted as complete and accurate.

The minutes are available at:

https://docs.google.com/document/d/1aK91S5O_G4eucM-3RRUThED5CYH1kx2PhnyNY8uT01A/edit?usp=sharing

Clinton Roy seconds the motion.

PROVISIONAL, NO PROXY:

24/24 voted

Yea 20 83%

Nay 0 0%

Abstain 4 17%

 

Motion has passed.

3. To receive the REPORTS of activities of the preceding year from OFFICE BEARERS

MR JOEL ADDISON – President

MR CLINTON ROY – Secretary

MR RUSSELL STUART – Treasurer

Includes presentation of the Auditor’s Report

Question from Miles to Clinton about notifications/emails. Clinton responds that it’s only one or two a day generally, but during times of great upset, it’s a lot of time to deal with complicated questions.

Question from Sae Ra to Russell, is LA going to be able to stick with the 6% LA tax, or are we going to have to raise it. Russell says it’s very hard, this year it was near 7%, and that is without the fairly expensive face-to-face; which is an important event for new council members. Russell would be in favour of raising it to keep such important activities happening.

Question from Katie about the election, the window was only nine days, was quorum achieved. Joel responds that the scrutineers report is at the end of the agenda. We don’t have a high quorum required, so it was easily achieved. End of voting date was earlier this year than when we ran AGM in person, but similar to last year. This was done to give Council time to prepare everything, especially for the returning officer. Joel apologises for not opening and advertising it earlier.

Question from Hamilton: Does Linux Australia still support OpenInfra Foundation? No one on council can remember any interactions this year, Joel will look into it.

Question from Hamilton: Will Linux.org.au Elections extension have [an] option to allow nominees to be emailed to accept nomination? That will be added to a list of things the council wants to update, including mail outs, and making the interface easier.

MOTION by RUSSELL STUART that the Auditor’s Report is a true statement of financial accounts. Seconded by Joel Addison.

26/26 voted

Yea 26 92%

Nay 0 0%

Abstain 2 8%

Motion is passed.

 

MOTION by JOEL ADDISON that the President’s report is correct. Seconded by Russell Stuart.

26/26 voted

Yea 23 88%

Nay 0 0%

Abstain 3 12%

Motion is passed.

 

MOTION by CLINTON ROY that the Secretary’s report is correct. Seconded by Jonathan Woithe.

26/26 voted

Yea 23 88%

Nay 0 0%

Abstain 3 12%

Motion is passed.

 

MOTION by RUSSELL STUART that the Treasurer’s report is correct. Seconded by Wil Brown.

26/26 voted

Yea 23 88%

Nay 0 0%

Abstain 3 12%

Motion is passed.

 

MOTION by JOEL ADDISON that the actions of Council during 2022 are endorsed by the membership. Seconded by Neill Cox.

24/26 voted

Yea 24 92%

Nay 0 0%

Abstain 2 8%

Motion is passed.

 

4. To CONSIDER items tabled in the call for agenda items

MOTION by Kathy Reid that the Linux Australia community endorse the actions of Council 2022 and thank them for their valued volunteer service. Sae Ra Germaine seconds the motion.

26/26 voted

Yea 22 85%

Nay 0 0%

Abstain 4 15%

Motion is passed.

 

MOTION by Craige McWhirter that after discussion of the merits of changing

titles from President / Vice President to Convenor / Co-Convenor that the

meeting chair tests the meeting for consensus that the suggested change is

worth including in the constitutional review, falling back to a vote if

consensus is blocked. Paul Wayper seconds the motion.

 

As mentioned in the motion, discussion on the merits of changing the name was opened.

Sae Ra comments that having certain role names in the organisation makes us look more professional. Wil comments that these role names are in the constitution, and would require a referendum to change. Jonathan comments that everyone knows what a president of an organisation is, but not a lot of people know what a convenor is, we should stick with the standard terms used elsewhere. Hamilton, used to be president of a musical society, easy to communicate with other entities with a well known role name. Kathy advocates against the motion, LA gives career experience to those that sit on council, and it looks better on the CV as the current role names that people understand, not as useful with the proposed role names. Steve thinks there are defined titles required by the corporations act, such as president, chair etc. and that even if this motion gets through, we might not be able to make such a change to the constitution. The original email with the motion and some motivations gets mentioned, wanting a flatter organisation, which is more in line with FLOSS organisations. Consensus was that a change in the titles is not desired.

A vote on the motion was then held, with people voting on whether they agree with changing the titles.

26/26 voted

Yea 1 4%

Nay 22 85%

Abstain 3 12%

Motion has failed.

 

5. DECLARATION of Election and WELCOME of incoming Council by the Returning Officer

Returning officer is Julien Goodwin.

See Appendix 1 for the Scrutineers Report. Julien delivers the report verbally to those in attendance.

Hamilton notes that the system is rather complicated to attempt to install, compared to standard wordpress. Why was this system picked? Kathy responds, part one, a number of factors went into the decision, what would the member system sit on top of, at the time there were more skills with wordpress. The second part, the elections module was a commissioned piece of work, a plugin to the system that we had. We decided to use open source solutions to align with LA’s values. Steve responds, this is a module within wordpress, it’s tightly entwined with a particular version of php, and Steve spent a chunk of time porting the module from Agileware hosted website to our own hosting. Steve has a bunch of notes on how this would be done, and is happy to share those with those willing to update the plugin.

Joshua asks what data was left behind, Julien responds it’s metadata of previous votes, with enough data linking, it could be used to determine voters from previous elections. The admin team have been tasked with cleaning it up.

6. NOMINATION and ELECTION of Members to Vacant Council Positions

All positions were filled during the original election, so no nomination or election was held during the meeting.

7. To HEAR and RESPOND to questions from the floor

Jonathan noted that the role name motion was about allowing a discussion on the issue at the meeting. However, when the vote was taken it was treated as a vote on including the role name issue in the constitutional review (the consensus test).  A further discussion makes it clear that the vote was a good enough consensus guage: that the consensus of the meeting was to not include the role names in the constitutional review.

Appendix 1 – Scrutineers report

Julien Goodwin, January 20, 2023

Dear Council,

As Returning Officer for the 2023 Council Election I am taking on a duty to ensure that;

  • The Election has been run in line with the Linux Australia Constitution
  • That the results of the Election have been scrutinised, and appropriate checks made to satisfy a reasonable person that the integrity of the Election has been upheld
  • And to report my findings, adverse or otherwise, to Council, who are responsible to act on those findings

This email serves as a record of the actions taken to ensure the above, and serves as a formal process for Returning Officers to follow in years to come.

In summary my findings are that:

  • The election has been run in line with the Linux Australia Inc., Constitution as per the Linux Australia website
  • The results of the Election have been scrutinised, and I am satisfied that the integrity of the Election has been upheld
    The results are summarised below
    The recommendation from previous years that the CiviCRM election module eliminate candidates who are elected to lower-ranked positions if they are already elected to a higher ranked position is still required.

Detailed notes follow. These notes include PII references that will need to be redacted if this report is presented to members.

Julien Goodwin

January 20th 2023

Compliance with the Constitution

S(15) of the Constitution stipulates the following requirements, which I have validated:

  • Nominations of candidates and their consent to be nominated is handled via the CiviCRM election module; no manual check done
  • The number of nominations exceeds the number of positions, and a ballot is being held, via the CiviCRM election module
  • The ballot cannot be conducted more than 60 days before an AGM; the AGM is 21 January 2023, the ballot occurred after nominations closed on 8 January 2023. This is in line with the Constitution.
  • A person nominated to Council must be a current member of Linux Australia, Inc. This is enforced by the CiviCRM election module; no manual check done

I am satisfied that the Constitution has been complied with.

 

The post Minutes of Linux Australia – Annual General Meeting 2023 appeared first on Linux Australia.

,

Colin CharlesHello 2024

At this rate, there is no real blogging here, regardless of the lofty plans to starting writing more. Stats update from Hello 2023:

219 days on the road (less than 2022! -37, over a month, shocking), 376,961km travelled, 44 cities, 17 countries.

Can’t say why it was less, because it felt like I spent a long time away…

In Kuala Lumpur, I purchased a flat (just in time to see Malaysia go down), and I swapped cars (had a good 15 year run). I co-founded a company, and I think there is a lot more to come.

2024 is shaping up to be exciting, busy, and a year, where one must just do.

good read: 27 Years Ago, Steve Jobs Said the Best Employees Focus on Content, Not Process. Research Shows He Was Right. in simple terms, just do.

,

Lev LafayetteAnother Year in Supercomputing

Since late in 2007 I have been involved in the field of high performance computing. Initially, this was at the Victorian Partnership for Advanced Computing, but just before that organisation closed its doors in December 2015 I accepted a similar role at the University of Melbourne. The end of the year provides a reason for reflection, an annual report if one likes, and whilst activities not related to my vocation and profession will be dealt with in a subsequent entry, the opportunity is taken here to review workplace activities and in particular, changes in the environment for the University's general HPC system, Spartan. Spartan now has 6159 accounts across 2109 projects in diverse disciplines in the life sciences, engineering, economics, mathematics, and more and has been cited in 62 papers in the past year.

Some of those papers led to presentations to the Research Computing Services (RCS) team through the Cultural Working Group (CWG), which I have chaired for the past two years and held responsibility for organising these talks. In total six presentations were held this year, with a personal favourite on the use of AI algorithms, a supercomputer (Spartan), and robotics to sort plastic waste from two researchers at the Department of Infrastructure Engineering. The CWG was formed in 2020 following recognition from a staff survey that not all was well in RCS in terms of staff awareness of the group's objective, work between the different groups within the RCS, transparency in decision-making, involvement, and influence in decisions, career-progression opportunities, and job security. The staff-led CWG (with one management representative) made a concerted effort across those targetted areas and, following a survey in the middle of this year, substantial improvements were found in every criterion. At the end of this year, just after the last tech/researcher presentation, it brought great pleasure to say that whilst operations would continue, the group had succeeded in achieving its objectives and could close down as a formal body and as a successful project.

A very large part of my role at the University consists of training various postgraduate and postdoctoral researchers on how to use the system. This year included some 24 days of workshops involving close to 500 participants, roughly on par with other years and deliberately pulling back a bit from the first year of COVID, when over 40 of such workshops were conducted. Of particular note was early in the year a review was conducted of usage from those who had received training the previous year, resulting in the very surprising metric that at least 54.14% of cluster utilisation in 2022 was conducted by users after they had received training. I have always emphasized how important HPC training is but it was astounding to see such a metric as proof. As another form of training this year I continued with my regular activity as a guest lecturer and tutor for the master's level course Cluster and Cloud Computing. My role in this, previously just a single lecture, has now been extended to six lectures and workshops and is likely to expand in 2024. I must also mention here a presentation on RCS services to the Quantitative and Applied Ecology Research Group, with a future paper in development from that body on software citations.

Another major part of my role is scientific software optimisation and installation. Apart from the usual work in this field this year had the bonus of Spartan receiving its first major operating system upgrade since it was first turned on in 2015. Changing the underlying major release of the operating system (and indeed, jumping from RHEL v7 to v9) required existing software to be recompiled. In a one-month period, working with demonic fury, I was primarily responsible for around 500 software builds and an expansion in job submission examples. At the same time, Spartan also finally had the opportunity to run the LINPACK tests to be recognised as one of the world's supercomputers. It was an award that was long overdue (we've had sufficient performance to be on that list for years) and even then the certificate was for only part of the entire system.

Other activities included establishing the Spartan HPC Champions group among power-users of the system who can provide training advice to other members of their research teams, and continued involvement as a Board member of the international HPC Certification Forum and as an irregular contributor to the EasyBuild code repository. I have no doubt that these and other activities will all continue in 2024, however, there will be an additional role as well, following a necessary and considered restructure of RCS, I have found myself as the recipient of a small promotion in role and responsibility. It will be a position I will take with the appropriate seriousness; after all, supercomputing is one of those activities that has made a massive change to improving the world and will continue to do so. For the technical staff, it can be challenging and rewarding as they provide the researchers the tools to make great discoveries and inventions. But those staff also need to be in an environment where they feel secure and can flourish - and that means listening to their technical advice, as they actually do know best for such matters. This will be certainly the most significant challenge in the coming year.

,

Linux AustraliaCouncil Meeting December 20, 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Neill Cox (Secretary)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine  (Council)
  • Jonathan Woithe (Council)

 

Apologies 

 

Not Present

  • Marcus Herstik (Council)

 

Meeting opened at 20:05 AEDT by Joel  and quorum was achieved.

Minutes taken by Neill

 

2. Log of correspondence

  • Steve Walsh: admin team reimbursements
  • Russell Stuart: Draft Proposals (Kiwi PyCon XIII)
  • Russell Stuart: Supporting documents for Kiwi Pycon 2024
  • Meetup: Meetup Invoice AU2023-16828
  • Michael Richardson: DrupalSouth Community Day Budget
  • Russell Stuart: Linux Australia’s 2022/2023 audit
  • Russell Stuart: Tax invoice attached (for audit)

3. Items for discussion

  • Motion: The treasurer and president sign the following statement from the auditor:
    • Principal activities

      Linux Australia is the peak body for Linux User Groups (LUGs) around Australia, and as such represents approximately 5000 Australian linux users and developers.

      There have been no significant changes in the state of affairs of the association during the year.


Operating Result

The profit of the association for the financial year after providing for income tax amounted to $123,426.

Significant changes in state of affairs

There have been no significant changes in the state of affairs of the association during the year.

Events after the reporting date

No matters or circumstances have arisen since the end of the financial year which significantly affected or may significantly affect the operations of the association, the results of those operations or the state of affairs of the association in future

financial years.

Signed in accordance with a resolution of the members of the committee: (president & treasurer).

  • MOVED BY: Russell Stuart
  • SECONDED: Sae Ra
  • Outcome: Passed unanimously
  • MOTION: The treasurer signs the engagement letter from our auditor
    • Their cover letter said: It is a best practice in our industry that all work is undertaken is covered by an up-to-date and relevant engagement letter – we always endeavour to operate at the highest standard of doing business. This letter ensures we both have a common understanding of what is required to be delivered, for what charge and also lists our general terms of doing business and also anything specific to the services you have requested.   But perhaps the main reason is they want us to acknowledge the price rise from $5500 to $6050.  The previous price rise was in 2016.
    • MOTION: The treasurer signs the auditors engagement letter.
    • MOVED BY: Russell Stuart
    • SECONDED: Joel
    • Outcome: Passed unanimously

 

  • Grant applications to be discussed
    • RISC-V, smart-watch and tablet computing development, from Russell Coker and Yifei Zhan. Funding applied for: $1600

 

MOTION: Council approves the grant application from Russell and Yifei to purchase hardware as described in the grant application.

MOVED: Joel

SECONDED: Wil

OUTCOME: Motion failed.

 

Russell and Yifei are encouraged to provide a more detailed proposal for the consideration of next year’s council.

 

  • RISC-V hardware for FOSS database development, from David Williams. Funding applied for: $2117.40

MOTION: Council approves the grant application from David Williams to purchase hardware as described in the grant application.

MOVED:  Joel

SECONDED: Neill

OUTCOME: Failed with one abstention.

 

David has provided a detailed proposal. Only one response from the membership has been received and it advocated for the $2,117.40 option. 

 

Mitigating against this proposal is that there already seems to be work in this area – see https://riscv.org/blog/2023/08/risc-v-public-beta-platform-release-%C2%B7-database-adaptation-evaluation-on-risc-v-server/

 

David is encouraged to provide a clearer explanation of which databases and what target distribution is intended.

 

  • Hack the Triangle workshops, from Lyndsey Jackson. Funding applied for: $2000

MOTION: Council approves the grant application from Lyndsey Jackson to run the workshops as per the grant application, subject to there being clear application and promotion of open source values and resources.

MOVED:  Joel

SECONDED: Wil

OUTCOME: Motion passed

 

4. Items for noting

5. Other business

  • Call for Nominations

Opening the election will happen on 26 December. Slightly delayed due to technical issues. The announcement email can be sent now to give the members as much notice as possible. 

  • AGM

Will be announced in the election email

  • EO2024
    • Code of Conduct – Sae Ra will convene the code of conduct team to review and respond to any items raised.
    • The organising team is now operating better, and items are progressing positively.

The post Council Meeting December 20, 2023 – Minutes appeared first on Linux Australia.

,

Linux AustraliaCouncil Meeting December 6, 2023 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (President)
  • Wil Brown (Vice-President)
  • Sae Ra Germaine  (Council)
  • Russell Stuart (Treasurer)
  • Jonathan Woithe (Council)
  • Neill Cox (Secretary)

 

Apologies

  • None received

 

Not present

  • Marcus Herstik (Council)

 

Meeting opened at 20:07 AEST by Joel and quorum was achieved.

Minutes taken by Neill.

 

2. Log of correspondence

  • 2023-11-23 Xero reconciliations
  • 2023-11-27 Action needed for your Wise Business account – Russell has responded
    Note: This also required supplying birth dates for various council members.
  • 2023-11-24 Linux Australia’s 2022/2023 audit
  • 2023-11-24 Activity Statement from ATO Portal for the auditor – From Russell. The required activity statement has now been provided.
  • 2023-11-29 Grant Application from Russell Coker for RISC-V, smart-watch, and tablet computing development
  • 2023-11-29 Report for Linux Australia grant: Pinephone development
  • 2023-11-30 Report for Linux Australia grant: Girls Canberra #1  
  • 2023-11-30 Grant Application from Lyndsey Jackson for Hack The Triangle Digital Skills Holiday Workshops
  • 2023-11-30 Audit documents
  • 2023-12-01 Grant Application from David M Williams for RISC-V hardware
  • 2023-12-01  Grant program update

 

3. Items for discussion

  • Kiwi PyCon 2024
    • Application will be forthcoming for Kiwi PyCon 2024 to run under the auspices of Linux Australia

 

4. Items for noting

  • Jonathan now has all the grant reports and will forward them to the appropriate person for inclusion in the LA Annual Report.

5. Other business

 

  • Drupal subcommittee update
    • Canberra Community day
      • good feedback, went well. Had a waiting list. 112 registrations when the target was 80. 86 people actually attended on the day. Post event survey has had limited responses but they were positive.
      • Budget not finalised yet, but looks like break even or perhaps a small profit.
      • Helped that it was held the day after the GovCMS event. Approx 60% government employees and 40% Drupal agencies. Those government attendees indicated that they would attend if it was held in Canberra again, but less likely to travel.
    • Sydney: planning going well about half of the sponsorships are sold. Seems people may be struggling
      • 85/88 talk  submissions which is typical for the larger Australian conferences. Track chairs are happy with quality.
      • Will be unable to bring someone over from the Drupal Association for a keynote. But may be able to do a panel. Still looking for a few more keynote speakers.
    • The Drupal subcommittee is also looking for other conferences to attend to promote Drupal South. 
    • The Drupal South Subcommittee will try and speak at Everything Open. Possibly as a cross promotional effort.
    • Budgets for past events (Wellington and Canberra) will probably not be finalised until January.

 

  • Admin team update
    • Mainly working with Sae Ra on the test upgrade of the website. Have hit a few issues. Will need to reach out to AgileWare. Will probably require building a new VM once the actual software stack can be determined.
  • The admin team are also talking to FastMail to iron out the final details. In particular what LA offers to FastMail for their support. Possibly a pair of tickets to LA events per year. Subject to final approval by LA council.
  • Slack billing needs to be looked into, they appear to have a slightly odd billing cycle. Will probably be looking for a replacement for Slack or applying for a not-for-profit licence.

 

  • Joomla subcommittee update

Successfully running hybrid meetings

About time to have a face to face so the Joomla subcommittee is planning a mini Joomla day for 24 February. It will be a single day event. 

  • Welcome on Friday night. 
  • Program on Saturday. 
  • Pizza/Code sprint on Sunday

It’s been three years since the last face to face event!

The goal is to keep the ticket price low. 

  • Self fund drinks. 
  • Not going to look for sponsors.
  • Bring your own T-shirt from previous conferences
  • Networking focussed event.
  • Rough budgets on 20/40/60 attendees. 
  • Will also aim to hold Joomla committee elections.
  • Venue will be OPN 365’s office space.

 

  • PyCon AU subcommittee update

The PyCon AU team are unable to attend in person. Richard Jones has supplied this update:

 

Little has happened since the last update. Our focus has been on trying to secure a chair for the next conference, which is ongoing (but far from hopeless, we’re currently talking to a number of people).

 

  • Flounder subcommittee update

Recent meeting about SEL4, expect to have some more meetings on the topic in the future

Also have recently had meetings about Linux phones 

Meetings tend to have about four attendees. Attendees vary with the topic.

Promoted via the Flounder website, matrix and email

Perhaps LA can help promote on other social media platforms (but it would be help if the council was reminded)

Ideal meeting size would be around 10 people. They tend to be focussed on technical work. If more attendees were present the focus could be shifted to more of a lecture style.

 

  • LUV  subcommittee update

 

  • WordPress subcommittee update
    • Organising team formed: 12 members
    • WordCamp Central orientation meeting
    • WordCamp Sydney 2024 approved – in planning stage
    • Website holding page https://sydney.wordcamp.org/2024/ 
    • Next steps: Organising team onboarding, Venue lockin and dates, Apply to LA to form subcommittee

 

  • Everything Open 2024

Call for sessions has closed 101 quality submissions

No free keynotes, some paid keynotes but they are a bit pricey

Several sponsors have signed up . Happy at this point, but need a few more by January.

Not ready to open ticket sales yet, but will hopefully be ready to in January.

The post Council Meeting December 6, 2023 – Minutes appeared first on Linux Australia.

Lev LafayetteThe2023 International Conference on Green and Innovation-Driven Urban Development

Following the successful first conference last year, this year’s 2023 International Conference on Green and Innovation-driven Development in Cities and Towns was hosted in Suzhou with proceedings held at the recently opened Suzhou International Conference Hotel. The lead host of the conference was the Foreign Affairs Office of the Jiangsu Provincial People’s Government, and the main organiser was the Xiangcheng District People’s Government of Suzhou City. Suzhou is a major city in east China, founded in 514 BCE and is now the most populous city in Jiangsu Province, with a population of roughly 7.5 million in the city and 14 million in the administrative area.

International guests at Higer Bus Company

The conference theme was “Jointly creating a new platform for a new ecology for shared new benefits” which develops on President Xi Jinping’s statements on eco-civilization as embodied in the Chinese Constitution: “The construction of ecological civilization is a millennial plan for the sustainable development of the Chinese nation” of which the “two mountains theory”, i.e., “Green waters and green mountains are [as valuable as] mountains of gold and silver” (Lǜ shuǐ qīngshān
jiùshì jīnshān yín shān) also applies. With the world facing ever-increasing human-caused global warming that causes more extreme weather such an approach is necessary for both raising the income and wealth of people, as well as protecting and enhancing the welfare of all life on our shared planet.

Over 350 attendees were at the conference including Local and international dignitaries, academics, engineers, and scientists. International guests were from Japan, Malaysia, Vietnam, Portugal, Italy, New Zealand, and Australia. Following a welcoming banquet by Xiangcheng’s District People’s Government,
the formal proceedings included the opening ceremony followed by parallel sessions on towns with distinctive features, integrated development on urban and rural areas, and heritage protection with sustainable development. My own attendance was to the latter, which included a number of international and local speakers and case studies with examples from Italy, Portugal, the Netherlands, and Suzhou itself. There was, of course, an extensive comparison between Suzhou and Venice as two cities with extensive historical canals and other waterways.

The majority of the conference proceeding however was spent on visits to Suzhou, combining the two vectors of heritage conservation and environmental technologies. From the former, this included visits to the Imperial Kiln Bricks Museum, the Silk Museum, Embroidery Art Museum, the especially beautiful
Taihu Lake Wetland Park, and the UNESCO World Heritage site, the Humble Administrator’s Garden. For the latter, this included the Suzhou City Industrial Park Exhibition Centre, the Higer Bus Company, the High-Tech Zone Exhibition Hall, the Urban Planning Exhibition Hall (with its strong connections to Singapore), the High-Tech Rail Tram Limited, and the new local campus of Nanjing University, China’s most famous university.

The two vectors give credence to both the “two mountains theory” but also illustrate the genuine and successful attempt of Suzhou city and the region to integrate both heritage conservation, environmental protection, and technological development. It is difficult enough to achieve the former two, but when coupled with the latter great care and intelligence is required. The
development shown by Suzhou city illustrates the superficial contradiction between the “two mountains” can be resolved. One cannot help but be impressed by the dedication shown in the development of ICVs (Intelligent Connected Vehicles) and electric vehicles. The sheer quantity of electric cars
on the road is already apparent, and the numbers indicate that soon electric vehicles will make up the majority of new car sales in China, and the rest of the world will follow.

For international guests it must be mentioned with great appreciation that the hospitality shown by Foreign Affairs Office of the Jiangsu Provincial People’s Government was absolutely second to none. We were given a vision of a city that is prosperous, populous, harmonious, green, high-tech, planned, and showed a great sensitivity and appreciation of its history and culture. There is sensitivity to the fact that this was a relatively well-off city within China, and the city was very keen to promote itself on a local level in preference to discussing more regional, national, or even global environmental challenges. A desire was expressed by many to have greater detail and even an extra day in the proceedings that would discuss at a lower level various environmental and technological challenges. This is, of course, less of a criticism and more of a desire that attendees wanted the conference to be even longer and deeper.

If I may be so bold to finish on a personal note, often in the afternoons and evenings after the day’s proceedings I would take a walk around the lake and its environs that the hotel faced. Whilst there was the vista of the city with its gleaming towers of steel and glass, I would find myself among the foliage and fauna of the lakeside in a peaceful and contemplative setting. It was in this place that I could not help but be absolutely charmed by a solar-powered intelligent robot boat that would make its way over the lake, cleaning whatever rubbish it could find of which, of course, there was very little. What a different and beautiful world it would be, I thought to myself, it would be if such robots were everywhere. This is the vision of the future that is offered to us: “green and innovation-driven development in cities and towns”. It would do us all well, to look at this example.

Robot lake cleaner

Published in the Australia-China Friendship Society Victorian Branch Newsletter, November 2023

AttachmentSize
Image icon IMG20231021090412.jpg124.15 KB
Image icon mmexport1698013829631.jpg127.24 KB

,

Lev LafayetteSpartan Finally Receives Its Laurels

Spartan HPC certificateWay back in 2015 the University of Melbourne had a general-purpose high performance computer system called "Edward", which itself replaced an even smaller system called "Alfred", both named after the Kings of Wessex. Edward was a fairly typical machine for its vintage and, as is normal, when a system is being retired the main researchers were asked what should be different in the new system. What was also normal was their answers; more cores, faster CPUs, etc. Consideration was given to not having an HPC system at all, potentially offloading the demand to a national facility. But cooler heads that possibly understood network throughput and the advantages of fine-tuning a local system to the needs of local researchers prevailed.

One of the interesting things about the review of Edward's utilisation was how it differed from what many researchers thought they needed. Rather than a system with more cores etc, what was really needed was faster throughput. Researchers simply didn't like their jobs sitting in the queue. Coupled with the fact that finances to fund the system weren't great (the naming of Spartan was a laconic reference to its lean cost-efficiency), necessity became the mother of invention. The Nectar research cloud had plenty of cores and, according to the metrics, the overwhelming majority of Edward's jobs were being run for capacity, rather than capability; over 75% were single-core jobs and over 90% were single-node jobs. Rather than spend a lot of money on high-speed interconnect, which is typical in HPC systems, a decision was made to have a smaller traditional HPC partition ("physical") and use a partition virtual machines ("cloud") with a slow interconnect for those singe-node jobs.

It was an innovative design and received a well-deserved initial launch, followed by a world-tour explaining the architecture to various conferences and HPC centres, including Multicore World, Wellington, 2016, and 2017; eResearchAustralasia 2016, Center for Scientific Computing (CSC) Goethe University Frankfurt, 2016, High Performance Computing Center (HLRS) University of Stuttgart, 2016, High Performance Computing Centre Albert-Ludwigs-University Freiburg, 2016; European Organization for Nuclear Research (CERN), 2016, Centre Informatique National de l’Enseignement Supérieur, Montpellier, 2016; Centro Nacional de Supercomputación, Barcelona, 2016, and the OpenStack Summit, at Barcelona 2016, and featured in OpenStack and HPC Workload Management in Stig Telfer (ed), "The Crossroads of Cloud and HPC: OpenStack for Scientific Research" (Open Stack, 2016).

The success of Spartan's architecture soon became apparent. Whilst Edward had completed just over 375,000 jobs in 2015, Spartan completed more than a million in its first year from launch. The system expanded with additional compute nodes from specialist projects, departments, and research agencies that had purchased their own hardware. But the most significant expansion was the addition of a substantial GPGPU partition, of 68 nodes and 272 nVidia P100 GPGPU cards, funded by a Linkage Infrastructure, Equipment and Facilities (LIEF) grant. Later, Spartan also introduced FastX for interactive remote desktops, and interactive sessions through Open OnDemand for Jupyer notebooks, RStudio, and Cryosparc.

The introduction of the GPGPU partition really transformed Spartan. It was what changed Spartan from being a small, experimental, but extremely successful system, to a world-class computing system. At the time we estimated that it would have entered at c200 on the top500.org list. However, running the tests to enter into that celebrated list requires both a lot of fine-tuning and, of course, it means that users, which have priority on our system, won't be able to use the nodes. On Spartan, it is typical that 100% of workers nodes are fully allocated, so for literally years there was little opportunity for the tests to be conducted.

Recently however, Spartan finally took the leap to change from running RedHat 7.x, which we had been doing since 2015, gradually working our way up the point-released, to RedHat 9.x. This provided a well-advertised two-week window of opportunity and whilst many other changes occurred to the operating system, the hardware, and the recompilation of hundreds of applications, a work colleague, Naren Chinnam (with necessary coordination with the rest of the HPC, Network and DC teams in getting the cluster stable enough for the benchmarks to finish), completed the LINPACK test for part of the system. As a result, Spartan now has a nice certificate, rated at 454 in the world (and third in Australia, after NCI/Gadi and Pawsey), with a benchmark score of 2.14 PetaFlops, representing the performance of the GPU partitions alone. It has already been noted that we actually have 88 A100 GPU nodes, not the 72 that were tested, which would have brought us up to 337 in the world, plus another 1/3rd of our performance could have come from the CPU-only partitions.

At the time of writing, Spartan has run 53881908 HPC jobs. There are 6134 users from the University of Melbourne and around the world, across 2097 projects. The original architecture (with our friends at the University of Freiburg with their alternative cluster-cloud combination) was also featured at the IEEE 13th International Conference on e-Science in 2017, and in the Science, Technology and Engineering Systems Journal in 2019, with other presentations on Spartan including use of the GPGPU partition at eResearch 2018, its development path at eResearchAU 2020, interactive HPC at eResearchNZ 2021, and over 250 papers citing Spartan as a contributing factor their research. Spartan continues to grow in users, usage, performance and, most importantly, research outcomes. Spartan may have finally received its laurels, but we are not resting on them.

AttachmentSize
Image icon 2023spartan.png367.83 KB

,

Francois MarierAutomatically rebooting for kernel updates

I use reboot-notifier on most of my servers to let me know when I need to reboot them for kernel updates since I want to decide exactly when those machines go down. On the other hand, my home backup server has very predictable usage patterns and so I decided to go one step further there and automate these necessary reboots.

To do that, I first installed reboot-notifier which puts the following script in /etc/kernel/postinst.d/reboot-notifier to detect when a new kernel was installed:

#!/bin/sh

if [ "$0" = "/etc/kernel/postinst.d/reboot-notifier" ]; then
    DPKG_MAINTSCRIPT_PACKAGE=linux-base
fi

echo "*** System restart required ***" > /var/run/reboot-required
echo "$DPKG_MAINTSCRIPT_PACKAGE" >> /var/run/reboot-required.pkgs

Note that unattended-upgrades puts a similar script in /etc/kernel/postinst.d/unattended-upgrades:

#!/bin/sh

case "$DPKG_MAINTSCRIPT_PACKAGE::$DPKG_MAINTSCRIPT_NAME" in
   linux-image-extra*::postrm)
      exit 0;;
esac

if [ -d /var/run ]; then
    touch /var/run/reboot-required
    if ! grep -q "^$DPKG_MAINTSCRIPT_PACKAGE$" /var/run/reboot-required.pkgs 2> /dev/null ; then
        echo "$DPKG_MAINTSCRIPT_PACKAGE" >> /var/run/reboot-required.pkgs
    fi
fi

and so you only need one of them to be installed since they both write to /var/run/reboot-required. It doesn't hurt to have both of them though.

Then I created the following cron job (/etc/cron.daily/reboot-local) to actually reboot the server:

#!/bin/bash

REBOOT_REQUIRED=/var/run/reboot-required

if [ -s $REBOOT_REQUIRED ] ; then
    cat "$REBOOT_REQUIRED" | /usr/bin/mail -s "Rebooting $HOSTNAME" root
    /bin/systemctl reboot
fi

With that in place, my server will send me an email and then automatically reboot itself.

This is a work in progress because I'd like to add some checks later on to make sure that no backup is in progress during that time (maybe by looking for active ssh connections?), but it works well enough for now. Feel free to leave a comment if you've got a smarter script you'd like to share.

,

Tim SerongStill Going With The Flow

It’s time for a review of the second year of operation of our Redflow ZCell battery and Victron Energy inverter/charger system. To understand what follows it will help to read the earlier posts in this series:

In case ~12,000 words of background reading seem daunting, I’ll try to summarise the most important details here:

  • We have a 5.94kW solar array hooked up to a Victron MPPT RS solar charge controller, two Victron 5kW Multi-Plus II inverter/chargers, a Victron Cerbo GX console, and a single 10kWh Redflow ZCell battery. It works really well. We’re using most of our generated power locally, and it’s enabled us to blissfully coast through several grid power outages and various other minor glitches. The Victron gear and the ZCell were installed by Lifestyle Electrical Services.
  • Redflow batteries are excellent because you can 100% cycle them every day, and they aren’t a giant lump of lithium strapped to your house that’s impossible to put out if it bursts into flames. The catch is that they need to undergo periodic maintenance where they are completely discharged for a few hours at least every three days. If you have more than one, that’s fine because the maintenance cycles interleave (it’s all automatic). If you only have one, you can’t survive grid outages if you’re in a maintenance period, and you can’t ordinarily use the Cerbo’s Minimum State of Charge (MinSoC) setting to perpetually keep a small charge in the battery in case of emergencies. As we still only have one battery, I’ve spent a fair bit of time experimenting to mitigate this as much as I can.
  • The system itself requires a certain amount of power to run. Think of the pumps and fans in the battery, and the power used directly by the inverters and the console. On top of that a certain amount of power is simply lost to AC/DC conversion and charge/discharge inefficiencies. That’s power that comes into your house from the grid and from the sun that your loads, i.e. the things you care about running, don’t get to use. This is true of all solar PV and battery storage systems to a greater or lesser degree, but it’s not something that people always think about.

With the background out of the way we can get on to the fun stuff, including a roof replacement, an unexpected fault after a power outage followed by some mains switchboard rewiring, a small electrolyte leak, further hackery to keep a bit of charge in the battery most of the time, and finally some numbers.

The big job we did this year was replacing our concrete tile roof with colorbond steel. When we bought the house – which is in a rural area and thus a bushfire risk – we thought: “concrete brick exterior, concrete tile roof – sweet, that’s not flammable”. Unfortunately it turns out that while a tile roof works just fine to keep water out, it won’t keep embers out. There’s a gadzillion little gaps where the tiles overlap each other, and in an ember attack, embers will get up in there and ignite the fantastic amount of dust and other stuff that’s accumulated inside the ceiling over several decades, and then your house will burn down. This could be avoided by installing roof blanket insulation under the tiles, but in order to do that you have to first remove all the tiles and put them down somewhere without breaking them, then later put them all back on again. It’s a lot of work. Alternately, you can just rip them all off and replace the whole lot with nice new steel, with roof blanket insulation underneath.

The colour is called Bluegum.

Of course, you need good weather to replace a roof, and you need to take your solar panels down while it’s happening. This meant we had twenty-two solar panels stacked on our back porch for three weeks of prime PV time from February 17 – March 9, 2023, which I suspect lost us a good 500kW of power generation. Also, the roof job meant we didn’t have the budget to get a second ZCell this year – for the cost of the roof replacement, we could have had three new ZCells installed – but as my wife rightly pointed out, all the battery storage in the world won’t do you any good if your house burns down.

We had at least five grid power outages during the year. A few were brief, the grid being down for only a couple of minutes, but there were two longer ones in September (one for 30 minutes, one for about an hour and half). We got through the long ones just fine with either the sun high in the sky, or charge in the battery, or both. One of the earlier short outages though uncovered a problem. On the morning of May 30, my wife woke up to discover there was no power, and thus no running water. Not a good thing to wake up to. This happened while I was away, because of course something like this would happen while I was away. It turns out there had been a grid outage at about 02:10, then the grid power had come back, but our system had not. The Multis ended up in some sort of fault state and were refusing to power our loads. On the console was an alarm message: “#8 – Ground relay test failed”.

That doesn’t look good.

Note the times in the console messages are about 08:00. I confirmed via the logs from the VRM portal that the grid really did go out some time between 02:10 and 02:15, but after that there was nothing in the logs until 07:59, which is when my wife used the manual changeover switch to shift all our loads back to direct grid power, bypassing the Victron kit. That brought our internet connection back, along with the running water. I contacted Murray Roberts from Lifestyle Electrical and Simon Hackett for assistance, Murray logged in remotely and reset the Multis, my wife flicked the changeover switch back and everything was fine. But the question remained, what had gone wrong?

The ground relay in the Multis is there to connect neutral to ground when the grid fails. Neutral and ground are already physically connected on the grid (AC input) side of the Multis in the main switchboard, but when the grid power goes out, the Multis disconnect their inputs, which means the loads on the AC output side no longer have that fixed connection from neutral to ground. The ground relay activates in this case to provide that connection, which is necessary for correct operation of the safety switches on the power circuits in the house.

The ground relay is tested automatically by the Multis. Looking up Error 8 – Ground relay test failed on Victron’s web site indicated that either the ground relay really was faulty, or possibly there was a wiring fault or an issue with one of the loads in our house. So I did some testing. First, with the battery at 50% State of Charge (SoC), I did the following:

  1. Disconnected all loads (i.e. flipped the breaker on the output side of the Multis)
  2. Killed the mains (i.e. flipped the breaker on the input side of the Multis)
  3. Verified the system switched to inverting mode (i.e. running off the battery)
  4. Restored mains power
  5. Verified there was no error

This demonstrated that the ground relay and the Multis in general were fine. Had there been a problem at that level we would have seen an error when I restored mains power. I then reconnected the loads and repeated steps 2-5 above. Again, there was no error which indicated the problem wasn’t due to a wiring defect or short in any of the power or lighting circuits. I also re-tested with the heater on and the water pump running just in case there may have been an issue specifically with either of those devices. Again, there was no error.

The only difference between my test above and the power outage in the middle of the night was that in the middle of the night there was no charge in the battery (it was right after a maintenance cycle) and no power from the sun. So in the evening I turned off the DC isolators for the PV and deactivated my overnight scheduled grid charge so there’d be no backup power of any form in the morning. Then I repeated the test:

  1. Disconnected all loads
  2. Killed the mains.
  3. Checked the console which showed the system as “off”, as opposed to “inverting”, as there was no battery power or solar generation
  4. Restored mains power
  5. Shortly thereafter, I got the ground relay test failed error

The underlying detailed error message was “PE2 Closed”, which meant that it was seeing the relay as closed when it’s meant to be open. Our best guess is that we’d somehow hit an edge case in the Multi’s ground relay test, where they maybe tried to switch to inverting mode and activated the ground relay, then just died in that state because there was no backup power, and got confused when mains power returned. I got things running again by simply power cycling the Multis.

So it kinda wasn’t a big deal, except that if the grid went out briefly with no backup power, our loads would remain without power until one of us manually reset the system. This was arguably worse than not having the system at all, especially if it happened in the middle of the night, or when we were away from home. The fact that we didn’t hit this problem in the first year of operation is a testament to how unlikely this event is, but the fact that it could happen at all remained a problem.

One fix would have been to get a second battery, because then we’d be able to keep at least a tiny bit of backup power at all times regardless of maintenance cycles, but we’re not there yet. Happily, Simon found another fix, which was to physically connect the neutral together between the AC input and AC output sides of the Multis, then reconfigure them to use the grid code “AS4777.2:2015 AC Neutral Path externally joined”. That physical link means the load (output) side picks up the ground connection from the grid (input) side in the swichboard, and changing the grid code setting in the Multis disables the ground relay and thus the test which isn’t necessary anymore.

Murray needed to come out anyway to replace the carbon sock in the ZCell (a small item of annual maintenance) and was able to do that little bit of rewriting and configuration at the same time. I repeated my tests both with and without backup power and everything worked perfectly, i.e. the system came back immediately by itself after a grid outage with no backup power, and of course switched over to inverting just fine when there was backup power available.

This leads to the next little bit of fun. The carbon sock is a thing that sits inside the zinc electrolyte tank and helps to keep the electrolyte pH in the correct operating range. Unfortunately I didn’t manage to get a photo of one, but they look a bit like door snakes. Replacing the carbon sock means opening the case, popping one side of the Gas Handling Unit (GHU) off the tank, pulling out the old sock and putting in a new one. Here’s a picture of the ZCell with the back of the case off, indicating where the carbon sock goes:

The tank on the left (with the cooling fan) is for zinc electrolyte. The tank on the right is for bromine electrolyte. The blocky assembly of pipes going into both tanks is the GHU. The rectangular box behind that contains the electrode stacks.

When Murray popped the GHU off, he noticed that one of the larger pipes on one side had perished slightly. Thankfully he happened to have a spare GHU with him so was able to replace the assembly immediately. All was well until later that afternoon, when the battery indicated hardware failure due to “Leak 1 Trip” and shut itself down out of an abundance of caution. Upon further investigation the next day, Murry and I discovered there was a tiny split in one of the little hoses going into the GHU which was letting the electrolyte drip out.

Drip… Drip… Drip…

This small electrolyte leak was caught lower down in the battery, where the leak sensor is. Murray sucked the leaked electrolyte out of there, re-terminated that little hose and we were back in business. I was happy to learn that Redflow had obviously thought about the possibility of this type of failure and handled it. As I said to Murray at the time, we’d rather have a battery that leaks then turns itself off than a battery that catches fire!

Aside from those two interesting events, the rest of the year of operation was largely quite boring, which is exactly what one wants from a power system. As before I kept a small overnight scheduled charge and a larger late afternoon scheduled charge active on weekdays to ensure there was some power in the battery to use at peak (i.e. expensive) grid times. In spring and summer the afternoon charge is largely superfluous because the battery has usually been well filled up from the solar by then anyway, but there’s no harm in leaving it turned on. The one hack I did do during the year was to figure out a way to keep a small (I went with 15%) MinSoC in the battery at all times except for maintenance cycle evenings, and the morning after. This is more than enough to smooth out minor grid outages of a few minutes, and given our general load levels should be enough to run the house for more than an hour overnight if necessary, provided the hot water system and heating don’t decide to come on at the same time.

My earlier experiment along these lines involved a script that ran on the Cerbo twice a day to adjust scheduled charge settings in order to keep the battery at 100% SoC at all times except for peak electricity hours and maintenance cycle evenings. As mentioned in TANSTAAFL I ran that for all of July, August and most of September 2022. It worked fine, but ultimately I decided it was largely a waste of energy and money, especially when run during the winter months when there’s not much sun and you end up doing a lot of grid charging. This is a horribly inefficient way of getting power into the battery (AC to DC) versus charging the battery direct from solar PV. We did still use those scripts in the second year, but rather more judiciously, i.e. we kept an eye on the BOM forecasts as we always do, then occasionally activated the 100% charge when we knew severe weather and/or thunderstorms were on the way, those being the things most likely to cause extended grid outages. I also manually triggered maintenance on the battery earlier than strictly necessary several times when we expected severe weather in the coming days, to avoid having a maintenance cycle (and thus empty battery) coincide with potential outages. On most of those occasions this effort proved to be unnecessary. Bearing all that in mind, my general advice to anyone else with a single ZCell system (aside from maybe adding scheduled charges to time-shift expensive peak electricity) is to just leave it alone and let it do its thing. You’ll use most of your locally generated electricity onsite, you’ll save some money on your power bills, and you’ll avoid some, but not all, grid outages. This is a pretty good position to be in.

That said, I couldn’t resist messing around some more, hence my MinSoC experiment. Simon’s installation guide points out that “for correct system operation, the Settings->ESS menu ‘Min SoC’ value must be set to 0% in single-ZCell systems”. The issue here is that if MinSoC is greater than 0%, the Victron gear will try to charge the battery while the battery is simultaneously trying to empty itself during maintenance, which of course just isn’t going to work. My solution to this is the following script, which I run from a cron job on the Cerbo twice a day, once at midnight UTC and again at 06:00 UTC with the --check-maintenance flag set:

Midnight UTC corresponds to the end of our morning peak electricity time, and 06:00 UTC corresponds to the start of our afternoon peak. What this means is that after the morning peak finishes, the MinSoC setting will cause the system to automatically charge the battery to the value specified if it’s not up there already. Given it’s after the morning peak (10:00 AEST / 11:00 AEDT) this charge will likely come from solar PV, not the grid. When the script runs again just before the afternoon peak (16:00 AEST / 17:00 AEDT), MinSoC is set to either the value specified (effectively a no-op), or zero if it’s a maintenance day. This allows the battery to be discharged correctly in the evening on maintenance days, while keeping some charge every other day in case of emergencies. Unlike the script that tries for 100% SoC, this arrangement results in far less grid charging, while still giving protection from minor outages most of the time.

In case Simon is reading this now and is thinking “FFS, I wrote ‘MinSoC must be set to 0% in single-ZCell systems’ for a reason!” I should also add a note of caution. The script above detects ZCell maintenance cycles based solely on the configured maintenance time limit and the duration since last maintenance. It does not – and cannot – take into account occasions when the user manually forces maintenance, or situations in which a ZCell for whatever reason hypothetically decides to go into maintenance of its own accord. The latter shouldn’t generally happen, but it can. The point is, if you’re running this MinSoC script from a cron job, you really do still want to keep an eye on what the battery is doing each day, in case you need to turn that setting off and disable the cron job. If you’re not up for that I will reiterate my general advice from earlier: just leave the system alone – let it do its thing and you’ll (almost always) be perfectly fine. Or, get a second ZCell and you can ignore the last several paragraphs entirely.

Now, finally, let’s look at some numbers. The year periods here are a little sloppy for irritating historical reasons. 2018-2019, 2019-2020 and 2020-2021 are all August-based due to Aurora Energy’s previous quarterly billing cycle. The 2021-2022 year starts in late September partly because I had to wait until our new electricity meter was installed in September 2021, and partly because it let me include some nice screenshots when I started writing TANSTAAFL on September 25, 2022. I’ve chosen to make this year (2022-2023) mostly sane, in that it runs from October 1, 2022 through September 30, 2023 inclusive. This is only six days offset from the previous year, but notably makes it much easier to accurately correlate data from the VRM portal with our bills from Aurora. Overall we have five consecutive non-overlapping 12 month periods that are pretty close together. It’s not perfect, but I think it’s good enough to work with for our purposes here.

YeaRGrid InSolar InTotal InLoadsExport
2018-20199,0316,68215,71311,8273,886
2019-20209,3246,46815,79212,2553,537
2020-20217,5826,34713,92910,3583,571
2021-20228,5315,64014,17110,849754
2022-20238,9365,74414,68011,534799

Overall, 2022-2023 had a similar shape to 2021-2022, including the fact that in both these years we missed three weeks of solar generation in late summer. In 2022 this was due to replacing the MPPT, and in 2023 it was because we replaced the roof. In both cases our PV generation was lower than it should have been by an estimated 500-600kW. Hopefully nothing like this happens again in future years.

All of our numbers in 2022-2023 were a bit higher than in 2021-2022. We pulled 4.75% more power from the grid, generated 1.84% more solar, the total power going into the system (grid + solar) was 3.59% higher, our loads used 6.31% more power, and we exported 5.97% more power than the previous year.

I honestly don’t know why our loads used more power this year. Here’s a table showing our consumption for both years, and the differences each month (note that September 2022 is only approximate because of how the years don’t quite line up):

Month20222023Diff
October988873-115
November866805-61
December767965198
January822775-47
February63872183
March81391198
April7751,115340
May9531,098145
June1,0731,14976
July1,1181,103-15
August9661,06599
September1,070964-116

Here’s a graph:

WTF happened in December and April?!?

Did we use more cooling this December? Did we use more heating this April and May? I dug the nearest weather station’s monthly mean minimum and maximum temperatures out of the BOM Climate Data Online tool and found that there’s maybe a degree or so variance one way or the other each month year to year, so I don’t know what I can infer from that. All I can say is that something happened in December and April, but I don’t know what.

Another interesting thing is that what I referred to as “the energy cost of the system” in TANSTAAFL has gone down. That’s the kW figure below in the “what?” column, which is the difference between grid in + solar in – loads – export, i.e. the power consumed by the system itself. In 2021-2022, that was 2,568 kW, or about 18% of the total power than went into the system. In 2022-2023 it was only 1,838kWh, or 12.5%:

YearGrid InSolar InTotal InLoadsExportTotal Outwhat?
2021-20228,5315,64014,17110,84975411,6032,568
2022-20238,9635,74414,68011,53479912,3331,838

The cause of this reduction is almost certainly that we didn’t spend two and a half months doing lots of grid charging of the battery in 2022-2023. This again points to the advisability of just letting the system do its thing and not messing with it too much unless you really know you need to.

The last set of numbers I have involve actual money. Here’s what our electricity bills looked like over the past five years:

YearFrom GridTotal BillCost/kWh
2018-20199,031$2,278.33$0.25
2019-20209,324$2,384.79$0.26
2020-20217,582$1,921.77$0.25
2021-20228,531$1,731.40$0.20
2022-20238,936$1,989.12$0.22

Note that cost/kWh as I have it here is simply the total dollar amount of our bills divided by the total power drawn from the grid (I’m deliberately ignoring the additional power we use that comes from the sun in this calculation). The bills themselves say “peak power costs $X, off-peak costs $Y, you get $Z back for power exported and there’s a daily supply charge of $SUCKS_TO_BE_YOU”, but that’s all noise. What ultimately matters in my opinion is what I call the effective cost per kilowatt hour, which is why those things are all smooshed together here. The important point is that with our existing solar array we were previously effectively paying about $0.25 per kWh for grid power. After getting the battery and switching to Peak & Off-Peak billing, that went down to $0.20/kWh – a reduction of 20%. Now we’ve inched back up to $0.22/kWh, but it turns out that’s just because power prices have increased. As far as I can tell Aurora Energy don’t publish historical pricing data, so as a public service, I’ll include what I’ve been able to glean from our prior bills here:

  • July 2023 onwards:
    • Daily supply charge: $1.26389
    • Peak: $0.36198/kWh
    • Off-Peak: $0.16855/kWh
    • Feed-In Tariff: $0.10869/kWh
  • July 2022 – July 2023
    • Daily supply charge: $1.09903
    • Peak: $0.33399/kWh
    • Off-Peak: $0.15551/kWh
    • Feed-In Tariff: $0.08883/kWh
  • Before July 2022:
    • Daily supply charge: $0.98
    • Peak: $0.29852
    • Off-Peak: $0.139
    • Feed-In Tariff: $0.06501

It’s nice that the feed-in tariff (i.e. what you get credited when you export power) has gone up quite a bit, but unless you’re somehow able to export 2-3x more power than you import, you’ll never get ahead of the ~20% increase in power prices over the last two years.

Having calculated the effective cost/kWh for grid power, I’m now going to do one more thing which I didn’t think to do during last year’s analysis, and that’s calculate the effective cost/kWh of running our loads, bearing in mind that they’re partially powered from the grid, and partially from the sun. I’ve managed to dig up some old Aurora bills from 2016-2017, back before we put the solar panels on. This should make for an interesting comparison.

YearFrom GridTotal BillGrid $/kWhLoadsLoads $/kWh
2016-201717,026$4,485.45$0.2617,026$0.26
2018-20199,031$2,278.33$0.2511,827$0.19
2019-20209,324$2,384.79$0.2612,255$0.19
2020-20217,582$1,921.77$0.2510,358$0.19
2021-20228,531$1,731.40$0.2010,849$0.16
2022-20238,936$1,989.12$0.2211,534$0.17

The first thing to note is the horrifying 17 megawatts we pulled in 2016-2017. Given the hot water and lounge room heat pump were on a separate tariff, I was able to determine that four of those megawatts (i.e. about 24% of our power usage) went on heating that year. Replacing the crusty old conventional electric hot water system with a Sanden heat pump hot water service cut that in half – subsequent years showed the heating/hot water tariff using about 2MW/year. We obviously also somehow reduced our loads by another ~3MW/year on top of that, but I can’t find the Aurora bills for 2017-2018 so I’m not sure exactly when that drop happened. My best guess is that I probably got rid of some old, always-on computer equipment.

The second thing to note is how the cost of running the loads drops. In 2016-2017 the grid cost/kWh is the same as the loads cost/kWh, because grid power is all we had. From 2018-2021 though, the load cost/kWh drops to $0.19, a saving of about 26%. It remains there until 2021-2022 when we got the battery and it dropped again to $0.16 (another 15% or so). So the big win was certainly putting the solar panels on and swapping the hot water system, with the battery being a decent improvement on top of that.

Further wins are going to come from decreasing our power consumption. In previous posts I had mentioned the need to replace panel heaters with heat pumps, and also that some of our aging computer equipment needed upgrading. We did finally get a heat pump installed in the master bedroom this year, and we replaced the old undersized lounge room heat pump with a new correctly sized unit. This happened on June 30 though, so will have had minimal impact on this years’ figures. Likewise an always-on computer that previously pulled ~100W is now better, stronger and faster in all respects, while only pulling ~50W. That will save us ~438kW of power per year, but given the upgrade happened in mid August, again we won’t see the full effects until later.

I’m looking forward to doing another one of these posts in a year’s time. Hopefully I will have nothing at all interesting to report.

,

yifeiMonitor Upstream Updates for OpenBSD Packages

As an OpenBSD package maintainer, I often need to watch for updates on packages I maintain. I used to do this using repology.org, which has the benefit of tracking package updates in many distros, but it can be unreliable for OpenBSD packages due to network delay and parsing problems.

One better way to watch for upstream update is using OpenBSD’s portroach service, it monitors new upstream release and provides a JSON API that can be combined with jq(1) to produce clear information.

Querying portroach #

To find all packages that can be updated for a given maintainer, first find the maintainer page on portroach, you can search by maintainer name and the page’s URL should be similar to the following:

https://portroach.openbsd.org/yifei%20zhan%20%3Copenbsd@zhan.science%3E.html

Now to get JSON output, add /json/ to the URL and change the suffix from .html to .json:

https://portroach.openbsd.org/json/yifei%20zhan%20%3Copenbsd@zhan.science%3E.json 

This endpoint will return all the packages maintained by a given maintainer, regardless of having an update or not. To only show packaged that can be updated, jq(1) can be used as a powerful filter and formatter:

$ ftp -Vo - https://portroach.openbsd.org/json/yifei%20zhan%20%3Copenbsd@zhan.science%3E.json\
| jq -r '.[] | select(.newver!=null) | (.fullpkgpath)+": "+(.ver)+" -> "+(.newver)'

Which prints a nice list of package I need to work on:

converters/opencc: 1.1.6 -> er.1.1.7
inputmethods/fcitx: 5.0.23 -> 5.1.1
inputmethods/fcitx-chinese-addons: 5.0.17 -> 5.1.1
inputmethods/fcitx-config-qt: 5.0.17 -> 5.1.1
inputmethods/fcitx-gtk: 5.0.23 -> 5.1.0
inputmethods/fcitx-lua: 5.0.10 -> 5.0.11
inputmethods/fcitx-qt: 5.0.17 -> 5.1.1
inputmethods/fcitx-table-extra: 5.0.13 -> 5.1.0
inputmethods/libime: 1.0.17 -> 1.1.2

Closing note #

Please be mindful that portroach is not infaillible, it may produce inaccurate result for some upstreams. The hosted version is a community resource, so please don’t abuse it, If you want, you can selfhost it with source code from its GitHub repository.

,

Francois MarierUpgrading from Ubuntu 20.04 focal to 22.04 jammy

A few weeks ago, I upgraded a few machines from Ubuntu 20.04 (focal) to 22.04 (jammy). Here are the things that needed fixing after the upgrade.

Network problems

Firstly, I had to fix the resolution of .local domains the same way as I did when I upgraded a different machine from 18.04 (bionic) to 20.04 (focal).

ssh agent problems

Then, I found that ssh-add no longer worked and instead returned this error:

Could not open connection to your authentication agent

While this appears to be a known issue, the work-around suggested in the i3 forum didn't work for me. What did work was the solution described in this blog post:

  1. Add this to my ~/.bash_profile:

    eval $(systemctl --user show-environment | grep SSH_AUTH_SOCK)
    export SSH_AUTH_SOCK
    
  2. Add this to my startup script:

    /usr/bin/systemctl --user start ssh-agent.service
    

I'm not sure why ED25519 keys don't work in gnome-keyring since that bug was supposedly fixed a while back, but starting gnome-keyring-ssh.service instead of ssh-agent.service didn't work for me.

Packages

When it comes to specific packages, I removed this obsolete package:

  • popularity-contest

I also installed these two new packages:

As always, I put any packages I backport from Debian unstable into my PPA. So far with jammy, I only had to update tiger to silence some bogus warnings.

Francois MarierMonitoring browser network traffic on Android using mitmproxy

Using mitmproxy to intercept your packets is a convenient way to inspect a browser's network traffic.

It's pretty straightforward to setup on a desktop computer:

  1. Install mitmproxy (apt install mitmproxy on Debian) and start it:

     mitmproxy --mode socks5 --listen-port 9000
    
  2. Start your browser specifying the proxy to use:

     chrome --proxy-server="socks5://localhost:9000"
     brave-browser --proxy-server="socks5://localhost:9000"
    
  3. Add its certificate authority to your browser.

At this point, all of the traffic from that browser should be flowing through your mitmproxy instance.

Android setup

On Android, it's a little less straightforward:

  1. Start mitmproxy on your desktop:

     mitmproxy --mode regular --listen-port 9000
    
  2. Open that port on your desktop firewall if needed.

  3. On your Android device, change your WiFi settings for the current access point:
  4. Proxy: Manual
  5. Proxy hostname: 192.168.1.100 (IP address of your desktop)
  6. Proxy port: 9000
  7. Turn off any VPN.
  8. Turn off WiFi.
  9. Turn WiFi back on.
  10. Open http://mitm.it in a browser to download the certificate authority file.
  11. Open the system Settings, Security and privacy, More security and privacy, Encryption & credentials, Install a certificate and finally choose CA certificate.
  12. Tap Install anyway to dismiss the warning and select the file you just downloaded.

Once you have gone through all of these steps, you should be able to monitor (on your desktop) the HTTP and HTTPS requests made inside of your Android browsers.

Note that many applications will start failing due to certificate pinning.

,

yifeiEncrypted and Version Controlled File Sync with git-annex(1)

git-annex(1) is a versatile and cross-platform tool build on top of git, it can sync, backup, archive files and provides many useful primitives for building customized workflow and storage system, for example, by combining git-annex with gcrypt, it’s possible to fully encrypt data stored on a remote.

Partially due to its versatility, it has a steeper learning curve than some other tools in this field and it took me some time to figure out how to make it work for me, here is a quick guide that documents my journey.

Prerequisite and Installation #

git-annex and git-remote-gcrypt is available from many package manager, to install them on Debian:

# apt-get install git-annex git-remote-gcrypt

git-annex supports multiple encryption mode, I will be going with the default hybrid mode since it allows more keys to be added in future. In this mode, data is encrypted with gpg using a symmetric key generated during remote initialization, the key then is encrypted by a gpg public key specified during initremote. After that, the symmetric key is checked into the git repository. This is useful when multiple users wish to access the same encrypted repository, but doing so is outside the scope of this post, for doing that and other advanced operations, read git-annex’s gcrypt guide for more details.

I opt to create a new key for this use case, but any gpg key will do.

Setup Local Repository #

The first step is to create a local repository as base, which will then be synced to remotes:

laptop$ git init myrepo
laptop$ cd myrepo
laptop$ git annex init

To checkin and commit some file into it:

laptop$ touch example
laptop$ git annex .
laptop$ git commit -a -m 'test'

Setup Encrypted Remote #

First, create a bare repository on the server, it will hold encrypted data later:

server$ git init --bare myrepo-remote

Then, on the local machine, add the newly created repository on the server as an encrypted remote, it’s a good practice to give it a descriptive name:

(To find the KEYID, run gpg --list-key)

laptop$ git annex initremote homeserver type=gcrypt gitrepo=rsync://server_hostname/path/to/myrepo-remote keyid=$KEYID
gcrypt: Repository not found: rsync://server_hostname/path/to/myrepo-remote
gcrypt: Setting up new repository
gcrypt: Remote ID is :id:XXX
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Compressing objects: 100% (3/3), done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0
gcrypt: Encrypting to:  -r XXX
gcrypt: Requesting manifest signature
To gcrypt::rsync://server_hostname/path/to/myrepo-remote
 * [new branch]      git-annex -> git-annex
ok
(recording state in git...)

With this done, it should now be possible to sync local repository to the remote:

laptop$ git annex sync --content

Work with Multiple Local Machines #

To accecss this encrypted repository from another machine (e.g. a desktop PC), first setup the gpg key on such machine, then clone and decrypt the repository:

desktop$ git clone gcrypt::rsync://server_hostname/path/to/myrepo-remote myrepo
Cloning into 'myrepo'...
gcrypt: Decrypting manifest
gpg: Good signature from "omnirepo (annex)" [unknown]
gcrypt: Remote ID is :id:XXX
Receiving objects: 100% (5/5), done.

Sync command will also work on the new machine for sending modified files to the remote:

desktop$ git annex sync --content
commit 
[master cec51a4] git-annex in XXX
 1 file changed, 1 insertion(+)
ok
pull origin 
gcrypt: Decrypting manifest
ok
push origin 
gcrypt: Decrypting manifest
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Compressing objects: 100% (4/4), done.
Total 6 (delta 0), reused 0 (delta 0), pack-reused 0
gcrypt: Encrypting to: --throw-keyids --default-recipient-self
gcrypt: Requesting manifest signature
To gcrypt::rsync://server_hostname/path/to/myrepo-remote
   bbed528..cec51a4  master -> synced/master
   c387409..575869a  git-annex -> synced/git-annex
ok

Troubleshooting #

Cannot write to annex file #

Annexed file is set to readonly (locked) to prevent accidental modification, run git annex unlock locked_file to unlock the file first.

Remove Unwanted Remote #

git-annex manages its remotes via git, to delete a remote, run git remote remove oldremote

,

Tim RileyOpen source status update, September 2023

With the two big PRs introducing our next generation of asset support merged (here and here), September was a month for rapid iteration and working towards getting assets out in a 2.1 beta release.

The pace was lively! Towards the end of the month, Luca and I were trading PRs and code reviews on almost a daily basis. Thanks our opposing timezones, Hanami was being written nearly 24h a day!

Assorted small things

Most of the work was fairly minor: an error logging fix, some test updates for the new assets, error handling around asset manifests, and a bit of zeitwerkin’.

Making our better errors better

There was one interesting piece though. Earlier in this release cycle (back in June!), I overhauled our user-facing error handling. I added a middleware to catch errors and render static error pages intended display in production. As part of this change, I adjusted our router to raise exceptions for not found routes: doing this would allow the error to be caught and a proper 404 page displayed. So that was production sorted. For development, we integrated the venerable better_errors, wrapped by our own hanami-webconsole gem.

It was only some months later that we realised 404s in development were being returned as 500s. This turned out to be because better_errors defaults to a 500 response code at all times. In its middleware:

status_code = 500
# ...
response = Rack::Response.new(content, status_code, headers)

Well, maybe not quite at all times. The lines right beneath status_code = 500:

status_code = 500
if defined?(ActionDispatch::ExceptionWrapper) && exception
  status_code = ActionDispatch::ExceptionWrapper.new(env, exception).status_code
end

Looks like Ruby on Rails gets its own little exception carved out here, via some hard-coded constant checks that reach deep inside Rails internals. This will allow better_errors to return a 404 for a not found error in Rails, but not in any other Ruby framework.

This is not a new change. It arrived over ten years ago, and I can hardly blame the authors for wanting a way to make this work nicely with the predominant Ruby application framework of the day.

Today, however, is a different day! We’re here to change the Ruby framework balance. � So we needed a way to make this work for Hanami. What didn’t feel feasible at this point was a significant change to better_errors: our time was limited and at best we had the appetite only for a minor tactical fix.

Our resulting fix in webconsole (along with this counterpart in hanami) does monkey patch better_errors, but I was very pleased with how gently we could do it. The patch is tiny:

module BetterErrorsExtension
  # The BetterErrors middleware always returns a 500 status when rescuing an exception
  # (outside of Rails). This is not not always appropriate, such as for a
  # `Hanami::Router::NotFoundError`, which should be a 404.
  #
  # To account for this, gently patch `BetterErrors::Middleware#show_error_page` (which is
  # called only when an exception has been rescued) to pass that rescued exception to a proc
  # we inject into the rack env here in our own middleware. This allows our middleware to know
  # the about exception class and provide the correct status code after BetterErrors is done
  # with its job.
  #
  # @see Webconsole::Middleware#call
  def show_error_page(env, exception = nil)
    if (capture_proc = env[CAPTURE_EXCEPTION_PROC_KEY])
      capture_proc.call(exception)
    end

    super
  end
end
BetterErrors::Middleware.prepend(BetterErrorsExtension)

In order to know which response code to use for the page, we need access to the exception that better_error is catching. Right now it provides no hooks to expose that. So instead we prepend some behaviour in front of their #show_error_page, which is only called by the time an error is to be rendered. We look for a proc on the rack env, and if one is there, we pass the exception to it, and then let better_errors get on with the rest of its normal work.

Then, in our own webconsole middleware, we set that proc to capture the exception, using Ruby closure semantics to assign that exception directly to a local variable:

def call(env)
  rescued_exception = nil
  env[CAPTURE_EXCEPTION_PROC_KEY] = -> ex { rescued_exception = ex }

  # ...
end

After that, we call the better_errors middleware, letting it do its own thing:

def call(env)
  rescued_exception = nil
  env[CAPTURE_EXCEPTION_PROC_KEY] = -> ex { rescued_exception = ex }

  status, headers, body = @better_errors.call(env)
end

And then once that is done, we can use the exception (if we have one) to fetch an appropriate response code from the Hanami app config, and then override better_errors’ response code with our own:

def call(env)
  rescued_exception = nil
  env[CAPTURE_EXCEPTION_PROC_KEY] = -> ex { rescued_exception = ex }

  status, headers, body = @better_errors.call(env)

  # Replace the BetterErrors status with a properly configured one for the Hanami app
  if rescued_exception
    status = Rack::Utils.status_code(
      @config.render_error_responses[rescued_exception.class.name]
    )
  end

  [status, headers, body]
end

That’s it! Given how light touch this is, and how stable better_errors is, I’m confident this will serve our purposes quite well for now.

We don’t want to live with this forever, however. In our future I see a fit for purpose developer errors reporter that is fully integrated with Hanami’s developer experience. Given current timelines, this will probably won’t come for at least 12 months, so if this is something you’re interested in helping with, please reach out!

Kickstarting dry-operation!

While the work on Hanami continued, I also helped kickstart work on a new dry-rb gem: dry-operation! Serving as the successor to dry-transaction, with dry-operation we’ll introduce significant new flexibility to modelling composable business operations, while still keeping a high-level API that presents their key flows in an easy to follow way.

Much of the month was spent ideating on various approaches with Marc Busqué and Brooke Kuhlmann, and then by the end of the month, Marc was already underway with the development work. Go check out Marc’s September update for a little more of the background on this.

I’m excited we’re finally providing a bridge to the future for dry-transaction, and at the same time building one of the final pieces of the puzzle for full stack Hanami apps. This is an interesting one for me personally, too, since I’m acting more as a “product manager� for this effort, with Marc doing most of the direct development work. Marc’s been in the dry-rb/Hanami orbit for a while now, and I’m excited for this opportunity for him to step up his contributions. More on this in the future!

Releasing Hanami 2.1.0.beta2!

After all of this, we capped the month off with the release of Hanami 2.1.0.beta2! This was a big step: our first beta to include both views and assets together. In the time since this release we’ve already learnt a ton and found way to take things to another level… but more on that next month. 😉 See you then!

,

Francois MarierEnabling AppArmor on a Linode VPS in enforcement mode

Enabling AppArmor on a Debian Linode VPS is not entirely straightforward. Here's what I had to do in order to make it work.

Packages to install

The easy bit was to install a few packages:

apt install grub2 apparmor-profiles-extra apparmor-profiles apparmor

and then adding apparmor=1 security=apparmor to the kernel command line (GRUB_CMDLINE_LINUX) in /etc/default/grub.

Move away from using Linode's kernels

As mentioned in this blog post, I found out that these parameters are ignored by the Linode kernels.

I had to:

  1. login to the Linode Manager (i.e. https://cloud.linode.com/linodes/<linode ID>/configurations),
  2. click the node relevant node,
  3. click "Edit" next to the configuration profile, and
  4. change the kernel to "GRUB 2".

Fix grub

Next I found out that grub doesn't actually install itself properly because it can't be installed directly on the virtual drives provided by Linode (KVM). Manually running this hack worked for me:

grub-install --grub-setup=/bin/true /dev/null

Unbound + Let's Encrypt fix

Finally, my local Unbound installation stopped working because it couldn't access the Let's Encrypt certificates anymore.

The solution to this was pretty straightforward. All I needed to do was to add the following to /etc/apparmor.d/local/usr.sbin.unbound:

/etc/letsencrypt/archive/** r,
/etc/letsencrypt/live/** r,

,

Lev LafayetteThe Voice and Set Theory

The expression "Not all members of set N have characteristic r, but all elements with characteristic r are in set N" can be represented with standard set notation as follows.

1. There exist some elements in N that do not have characteristic r. Using the "∃" symbol, to denote "there exists", and the "∉" symbol, which denotes "not an element of."

∃x ∈ N : x ∉ R

This reads as "There exists an element x in N such that x is not an element of R."

2. For the second part, all elements with characteristic r are in set N:

This means that every element in set R is also in set N. This can be represented using the subset symbol "⊆."
R ⊆ N

This reads as "R is a subset of N," meaning every element in R is also an element in N.

3. So, combining both statements:

∃x ∈ N : x ∉ R and R ⊆ N

This expresses that not all members of set N ("No voters") have characteristic r ("racism"), but all elements with characteristic r ("racism") are in set N ("No voters").

Not all "no" voters are racist, but all racists are "no" voters.

,

Tim RileyOpen source status update, August 2023

After last month’s omnibus update, I’m back again, so soon!

August turned out to bring a lot of forward motion for our work on Hanami’s front end assets support. While Luca was taking his summer break, I carried on his work preparing hanami-assets 2.1 and its integration into the Hanami framework. Last week we caught up for a quick chat about these, and now both are merged!

Personally, I think this was an exciting evolution of how Luca and I work together. While previously we each took care of fairly distinct lines of work (there was enough to do, after all!), here we literally worked in tandem on one specific area, and it came out great!

Luca and I also hopped on another video call during August, this time with Seb Wilgosz of Hanami Mastery to record a special core team interview for the site’s 50th episode! I really enjoyed the chance to answer community questions about Hanami, and personally, it was a moment of reassurance that we’re still on the right track and are delivering useful things to people.

The episode isn’t published yet, but one thing that did arise from the episode is a new Hanami 2.1 GitHub project that I put together for tracking our remaining work for the release. Previously, this was in Trello, and with the move to GitHub I hope it will make not our remaining work move visible, but also create clearer opportunities for potential contributors.

Now, with those big two PRs merged and our remaining work more clearly listed, the pace is picking up! We’re now at the point where we can focus on the direct user experience of working with assets within a full Hanami app. I expect a lot will shake out from this in quick order. But more on that next month!

,

Stewart SmithPersonal Finance Apps

I (relatively) recently went down the rabbit hole of trying out personal finance apps to help get a better grip on, well, the things you’d expect (personal finances and planning around them).

In the past, I’ve had an off-again-on-again relationship with GNUCash. I did give it a solid go for a few months in 2004/2005 it seems (I found my old files) and I even had the OFX exports of transactions for a limited amount of time for a limited number of bank accounts! Amazingly, there’s a GNUCash port to macOS, and it’ll happily open up this file from what is alarmingly close to 20 years ago.

Back in those times, running Linux on the desktop was even more of an adventure than it has been since then, and I always found GNUCash to be strange (possibly a theme with me and personal finance software), but generally fine. It doesn’t seem to have changed a great deal in the years since. You still have to manually import data from your bank unless you happen to be lucky enough to live in the very limited number of places where there’s some kind of automation for it.

So, going back to GNUCash was an option. But I wanted to survey the land of what was available, and if it was possible to exchange money for convenience. I am not big on the motivation to go and spend a lot of time on this kind of thing anyway, so it had to be easy for me to do so.

For my requirements, I basically had:

  • Support multiple currencies
  • Be able to import data from my banks, even if manually
  • Some kind of reporting and planning tools
  • Be easy enough to use for me, and not leave me struggling with unknown concepts
  • The ability to export data. No vendor lock-in

I viewed a mobile app (iOS) as a Nice to Have rather than essential. Given that, my shortlist was:

GNUCash

I’ve used it before, its web site at https://www.gnucash.org/ looks much the same as it always has. It’s Free and Open Source Software, and is thus well aligned with my values, and that’s a big step towards not having vendor lock-in.

I honestly could probably make it work. I wish it had the ability to import transactions from banks for anywhere I have ever lived or banked with. I also wish the UI got to be a bit more consistent and modern, and even remotely Mac like on the Mac version.

Honestly, if the deal was that a web service would pull bank transactions in exchange for ~$10/month and also fund GNUCash development… I’d struggle to say no.

Quicken

Here’s an option that has been around forever – https://www.quicken.com/ – and one that I figured I should solidly look at. It’s actually one I even spent money on…. before requesting a refund. It’s Import/Export is so broken it’s an insult to broken software everywhere.

Did you know that Quicken doesn’t import the Quicken Interchange Format (QIF), and hasn’t since 2005?

Me, incredulously, when trying out quicken

I don’t understand why you wouldn’t support as many as possible formats that banks export your transaction data as. It cannot possibly be that hard to parse these things, nor can it possibly be code that requires a lot of maintenance.

This basically meant that I couldn’t import data from my Australian Banks. Urgh. This alone ruled it out.

It really didn’t build confidence in ever getting my data out. At every turn it seemed to be really keen on locking you into Quicken rather than having a good experience all-up.

Moneywiz

This one was new to me – https://www.wiz.money/ – and had a fancy URL and everything. I spent a bunch of time trying MoneyWiz, and I concluded that it is pretty, but buggy. I had managed to create a report where it said I’d earned $0, but you click into it, and then it gives actual numbers. Not being self consistent and getting the numbers wrong, when this is literally the only function of said app (to get the numbers right), took this out of the running.

It did sync from my US and Australian banks though, so points there.

Intuit Mint

Intuit used to own Quicken until it sold it to H.I.G. Capital in 2016 (according to Wikipedia). I have no idea if that has had an impact as to the feature set / usability of Quicken, but they now have this Cloud-only product called Mint.

The big issue I had with Mint was that there didn’t seem to be any way to get your data out of it. It seemed to exemplify vendor lock-in. This seems to have changed a bit since I was originally looking, which is good (maybe I just couldn’t find it?). But with the cloud-only approach I wasn’t hugely comfortable with having everything there. It also seemed to be lacking a few features that I was begging to find useful in other places.

It is the only product that links with the Apple Card though. No idea why that is the case.

The price tag of $0 was pretty unbeatable, which does make me wonder where the money is made from to fund its development and maintenance. My guess is that it’s through commission on the various financial products advertised through it, and I dearly hope it is not through selling data on its users (I have no reason to believe it is, there’s just the popular habit of companies doing this).

Banktivity

This is what I’ve settled on. It seemed to be easy enough for me to figure out how to use, sync with an iPhone App, be a reasonable price, and be able to import and sync things from accounts that I have. Oddly enough, nothing can connect and pull things from the Apple Card – which is really weird. That isn’t a Banktivity thing though, that’s just universal (except for Intuit’s Mint).

I’ve been using it for a bit more than a year now, and am still pretty happy. I wish there was the ability to attach a PDF of a statement to the Statement that you reconcile. I wish I could better tune the auto match/classification rules, and a few other relatively minor things.

,

Stewart SmithFitness watches and my descent into madness

Periodically in life I’ve had the desire to be somewhat fit, or at least have the benefits that come with that such as not dying early and being able to navigate a mountain (or just the city of Seattle) on foot without collapsing. I have also found that holding myself accountable via data is pretty vital to me actually going and repeatedly doing something.

So, at some point I got myself a Garmin watch. The year was 2012 and it was a Garmin Forerunner 410. It had a standard black/grey LCD screen, GPS (where getting a GPS lock could be utterly infuriatingly slow), a sensor you attached to your foot, a sensor you strap to your chest for Heart Rate monitoring, and an ANT+ dongle for connecting to a PC to download your activities. There was even some open source software that someone wrote so I could actually get data off my watch on my Linux laptops. This wasn’t a smart watch – it was exclusively for wearing while exercising and tracking an activity, otherwise it was just a watch.

However, as I was ramping up to marathon distance running, one huge flaw emerged: I was not fast enough to run a marathon in the time that the battery in my Garmin lasted. IIRC it would end up dying around 3hr30min into something, which at the time was increasingly something I’d describe as “not going for too long of a run”. So, the search for a replacement began!

The year was 2017, and the Garmin fenix 5x attracted me for two big reasons: a battery life to be respected, and turn-by-turn navigation. At the time, I seldom went running with a phone, preferring a tiny SanDisk media play (RIP, they made a new version that completely sucked) and a watch. The attraction of being able to get better maps back to where I started (e.g. a hotel in some strange city where I didn’t speak the language) was very appealing. It also had (what I would now describe as) rudimentary smart-watch features. It didn’t have even remotely everything the Pebble had, but it was enough.

So, a (non-trivial) pile of money later (even with discounts), I had myself a shiny and virtually indestructible new Garmin. I didn’t even need a dongle to sync it anywhere – it could just upload via its own WiFi connection, or through Bluetooth to the Garmin Connect app to my phone. I could also (if I ever remembered to), plug in the USB cable to it and download the activities to my computer.

One problem: my skin rebelled against the Garmin fenix 5x after a while. Like, properly rebelled. If it wasn’t coming off, I wanted to rip it off. I tried all of the tricks that are posted anywhere online. Didn’t help. I even got tested for what was the most likely culprit (a Nickel allergy), and didn’t have one of them, so I (still) have no idea what I’m actually allergic to in it. It’s just that I cannot wear it constantly. Urgh. I was enjoying the daily smart watch uses too!

So, that’s one rather expensive watch that is special purpose only, and even then started to get to be a bit of an issue around longer activities. Urgh.

So the hunt began for a smart watch that I could wear constantly. This usually ends in frustration as anything I wanted was hundreds of $ and pretty much nobody listed what materials were in it apart from “stainless steel”, “may contain”, and some disclaimer about “other materials”, which wasn’t a particularly useful starting point for “it is one of these things that my skin doesn’t like”. As at least if the next one also turned out to cause me problems, I could at least have a list of things that I could then narrow down to what I needed to avoid.

So that was all annoying, with the end result being that I went a long time without really wearing a watch. Why? The search resumed periodically and ended up either with nothing, or totally nothing. That was except if I wanted to get further into some vendor lock-in.

Honestly, the only manufacturer of anything smartwatch like which actually listed everything and had some options was Apple. Bizarre. Well, since I already got on the iPhone bandwagon, this was possible. Rather annoyingly, they are very tied together and thus it makes it a bit of a vendor-lock-in if you alternate phone and watch replacement and at any point wish to switch platforms.

That being said though, it does work well and not irritate my skin. So that’s a bonus! If I get back into marathon level distance running, we’ll see how well it goes. But for more common distances that I’ve run or cycled with it… the accuracy seems decent, HR monitor never just sometimes decides I’m not exerting myself, and the GPS actually gets a lock in reasonable time. Plus it can pair with headphones and be the only thing I take out with me.

,

Stewart SmithRandom useful macOS things for Linux developers

A few random notes about things that can make life on macOS (the modern one, as in, circa 2023) better for those coming from Linux.

For various reasons you may end up with Mac hardware with macOS on the metal rather than Linux. This could be anything from battery life of the Apple Silicon machines (and not quite being ready to jump on the Asahi Linux bandwagon), to being able to run the corporate suite of Enterprise Software (arguably a bug more than a feature), to some other reason that is also fine.

My approach to most of my development is to have a remote more powerful Linux machine to do the heavy lifting, or do Linux development on Linux, and not bank on messing around with a bunch of software on macOS that would approximate something on Linux. This also means I can move my GUI environment (the Mac) easily forward without worrying about whatever weird workarounds I needed to do in order to get things going for whatever development work I’m doing, and vice-versa.

Terminal emulator? iTerm2. The built in Terminal.app is fine, but there’s more than a few nice things in iTerm2, including tmux integration which can end up making it feel a lot more like a regular Linux machine. I should probably go read the tmux integration best practices before I complain about some random bugs I think I’ve hit, so let’s pretend I did that and everything is perfect.

I tend to use the Mac for SSHing to bigger Linux machines for most of my work. At work, that’s mostly to a Graviton 2 EC2 Instance running Amazon Linux with all my development environments on it. At home, it’s mostly a Raptor Blackbird POWER9 system running Fedora.

Running Linux locally? For all the use cases of containers, Podman Desktop or finch. There’s a GUI part of Podman which is nice, and finch I know about because of the relatively nearby team that works on it, and its relationship to lima. Lima positions itself as WSL2-like but for Mac. There’s UTM for a full virtual machine / qemu environment, although I rarely end up using this and am more commonly using a container or just SSHing to a bigger Linux box.

There’s XCode for any macOS development that may be needed (e.g. when you want that extra feature in UTM or something) I do use Homebrew to install a few things locally.

Have a read of Andrew‘s blog post on OpenBMC Development on an Apple M1 MacBook Pro too.

,

yifeiMake you own 3.5mm serial cable

Doing anything close to the kernel/bootloader on PinePhone almost always requires a serial cable, Pine64 store has premade serial cable available for 7$ USD, but making your own serial cable can be both cheaper and more flexible as a DIY cable can support multiple logic level and pinout configuration.

Parts Overview #

You will need:

  • A 3.5mm audio cable, I got mine from a pair of broken headphone
  • A multimeter for continuity test
  • A USB-Serial adapter, you can get one online for around 3$ USD, make sure it supports 3.3v logic level if you want to use it with PinePhone
  • 3 jump wires, for TX/RX/GND. Make sure those cables have female endings for connecting to serial adapter
  • (Optional) Soldering iron, some flux core solder and heat shrink tubing for making proper connection. You can skip this and instead use twisted wires and electrical tape to make connection

Make Connection #

The serial pinout of PinePhone is available from this Pine64 Wiki wiki, to put it simply:

If your 3.5mm plug has 3 rings:

|=|=|=|)   <-Plug Tip 
 | | |_RX
 | |_Tx
 GND

Tip Ring (rightmost): Rx
Middle Ring: Tx
Last (Leftmost): GND

If your 3.5mm plug has 4 rings:

|=|=|=|=|)  <-Plug Tip
 | | | |_RX
 | | |_Tx
 | -GND
 ^---- Not used

Tip Ring (rightmost): Rx
Middle Ring: Tx
Second Middle Ring: GND
Last (leftmost): Unused

With the pinout in mind, cut the headphone cable open and split the wires inside, for a cable with 3 rings there should be 3 seperate wires, and 4 if that’s a 4 ring plug.

Next, remove about 1cm of the isolation layer for each wire, and then use multimeter’s continuity test mode to find out which wire corresponds to which serial pin, it’s likely a good idea to label each wire with pin name at this stage.

Then, cut a jump wire open, strip about 1cm of the isolation layer like with the headphone cable, and twist it together with a wire from the headphone cable, repeat this process 3 times for Tx/Rx/GND (There are many videos on YouTube on this topic). You can also use soldering iron to make stronger.

After finish, test continuity again with multimeter to ensure every wire is properly connected, then protect the joint with electrical tape or heat shrink tubing (which needs to be put on before making connection).

Now the only step left is connecting the jump wire to the serial adapter. Since the PinePhone and the serial adapter are both considered host device, a cross-over connection is required, so what is transmitted can be received on the other side:

---------------      ----------------
serial    Tx  |------| Rx   headphone
adapter   Rx  |------| Tx   cable
side      GND |------| GND  side          
---------------      ----------------

Connect to serial console #

Flip the DIP switch 6 (the rightmost, labeled Headphone) on the PinePhone to enable serial access, connect the newly made cable to the PinePhone and a computer, then use any serial console tool to open a session. The following example uses cu(1) on OpenBSD but screen(1) and minicom(1) should also work.

$ doas cu -s 115200 -l /dev/cuaU0

,

yifeiOpenBSD on PinePhone Pro: First Impression

Disclaimer #

OpenBSD does not support PinePhone Pro yet and there are real risks involved in running it on your PinePhone Pro now, as such, I do not recommand anyone to do that. You might fry your device due to unsupported power management IC and in a worse case the battery might catch fire due to unconfigured/untested charging safety features.

The purpose of this post is to document how to install OpenBSD on arm64 platforms not fully supported by OpenBSD, and much of this post is not PinePhone-specific, if you intend to follow what documented here, please be mindful about the risks and apply common sense.

Overview #

  • OpenBSD installer cannot be used on bare metal if you want to install OpenBSD to an sdcard, because of insufficiant hardware support. However, it’s possible to install OpenBSD to a virtual machine and then transfer the installed system to a SD card to boot from

  • This post assumes you have a PinePhone Pro running Mobian with KVM properly configured, and an sdcard to transfer the installed system to

  • As for now the only way to interact with the running system is via a serial console cable, wired and wireless network are not supported, same for screen, keyboard, and USB host mode

  • Jump to Support Status to see what work (not much)

Prepare Disk Image #

To make full use of the sdcard, we will create a disk image with size equal to our sdcard. We can find precise size of the sdcard with fdisk on Mobian:

mobian$ echo p | sudo fdisk /dev/mmcblk1

A line similar to above should appear, showing the size of sdcard in bytes:

Disk /dev/mmcblk1: 29.72 GiB, 31914983424 bytes, 62333952 sectors

We can now create our disk image:

mobian$ qemu-img create -f qcow2 openbsd.vm.qcow2 31914983424

Bootstrap via virtual machine #

Installing OpenBSD on VM is relatively strightforward, get the minirootXX.img from OpenBSD mirror (at the moment I’m using miniroot73.img), and follow instruction from my other post

Add support files #

A freshly installed OpenBSD/arm64 VM is not bootable on bare metal, to make it bootable, we will need:

  • Device Tree Blob (DTB) for PinePhone Pro, which describes the hardware environment
    • OpenBSD’s dtb package is compiled from Linux source tree, you can see how it is compiled here
  • Support files for uboot, extracted from installer image

(I’m not sure if all the uboot files are needed, but it’s easy to extract them all)

This can be done from the VM we prepared:

Create mount point for operating on disk image #

vm# mkdir /mnt/{img,disk}

Prepare dtb and installer image #

vm# pkg_add dtb
vm# ftp https://cdn.openbsd.org/pub/OpenBSD/snapshots/arm64/miniroot73.img

Prepare and mount boot partition of installer image #

vm# vnconfig vnd0 miniroot73.img
vm# mount /dev/vnd0i /mnt/img/

Mount VM boot partition #

vm# mount /dev/sd0i /mnt/disk/

Copy files from installer boot partition to VM boot partition #

vm# cp -r /mnt/img/* /mnt/disk/

Copy DTB #

vm# cp /usr/local/share/dtb/arm64/rockchip/rk3399-pinephone-pro.dtb /mnt/disk/

Clean up #

vm# umount /mnt/disk/
vm# umount /mnt/img/
vm# vnconfig -u vnd0

Disable ohci #

ohci controller is not yet supported by OpenBSD on this device, and the existing driver can prevent the kernel from booting, before the root problem is addressed, we can disable ohci driver in kernel to workaround this.

vm# config -ef /bsd                                                                                                         
ukc> find ohci                                                                                                              
167 ohci* at pci* dev -1 function -1 flags 0x0                                                                              
236 ohci* at apldc*|agintc*|ampintc*|qcdwusb*|imxsrc*|imxdwusb*|mvmdio*|rktcphy*|rkpinctrl*|rkgrf*|rkdwusb*|hidwusb*|amldwus
b*|syscon*|sxisyscon*|simplebus*|mainbus0 early 0 flags 0x0                                                                 
413 ohci* at acpi0 addr -1 flags 0x0                                                                                        
ukc> disable 236                                                                                                            
236 ohci* disabled                                                                                                          
ukc> quit                                                     
Saving modified kernel.               

vm# shutdown -hp now

Write image to SD card #

Make sure you VM is properly shutdown, and your sdcard is at /dev/mmcblk1, then write the VM image to the sdcard.

mobian$ sudo qemu-img dd -f qcow2 -O raw if=openbsd.vm.qcow2 of=/dev/mmcblk1 bs=20M

Boot OpenBSD from Tow-boot #

To boot OpenBSD from the sdcard with Tow-boot:

  • Insert sdcard into PinePhone Pro
  • Then flip the DIP switch 6 (the rightmost, labeled Headphone) to enable serial access
  • Connect a serial cable and open a console session, the example uses cu(1) since I’m using OpenBSD, but minicom can also work, towboot uses 115200 as baudrate but other u-boot build might differ
$ doas cu -s 115200 -l /dev/cuaU0

Something similar to the following output can help you confirm your serial connection is working:

U-Boot TPL 2021.10 (Oct 04 2021 - 15:09:26)                                                         
Channel 0: LPDDR4, 50MHz                                                                            
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB                                  
Channel 1: LPDDR4, 50MHz                                                                            
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB                                  
256B stride                                                                                         
lpddr4_set_rate: change freq to 400000000 mhz 0, 1                                                  
lpddr4_set_rate: change freq to 800000000 mhz 1, 0                                                  
Trying to boot from BOOTROM                                                                         
Returning to boot ROM...   
  • Repeatedly press ESC to trigger tow-boot’s boot menu, select Boot from SD
                          Boot from eMMC                                        
                          Boot from SD                                          
                          Boot from USB                                         
                          Boot from PXE                                         
                          Boot from DHCP                                        
                          Boot from (sf0)                                       
                                                                                
                          Rescan USB                                            
                          Firmware Console                                      
                                                                                
                          Reboot                                                
                          Shutdown                                              
                         _          
  • Something silmilar to the following should indicate OpenBSD is booting, and a login prompt will appear soon
boot>                                                                                               
booting sd0a:/bsd: 10625552+2504232+292520+843464 [792195+91+1216848+729496]=0x13b2240
[ using 2739408 bytes of bsd ELF symbol table ]                                                     
Copyright (c) 1982, 1986, 1989, 1991, 1993                                                          
        The Regents of the University of California.  All rights reserved.                          
Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org                      
                                                  
OpenBSD 7.3-current (GENERIC.MP) #2182: Thu Jul  6 15:02:37 MDT 2023                                
    deraadt@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP                            
real mem  = 4088885248 (3899MB)                                                                     
avail mem = 3883520000 (3703MB)

Support status #

Feature State Note
Screen No Screen lights up but no signal
USB Host No USB port is not powered
Built-in EMMC Yes sd1 at scsibus1
SD Card Yes sd0 at scsibus0
WIFI No bwfm0 at sdmmc0 needs brcmfmac43455-sdio.pine64,pinephone-pro.bin, loading this can lead to kernel crash
Sensors Partial GPU/CPU temperature is reported by rktemp(4), no other sensor detected
CPU Yes All 6 CPU cores are detected and run fine with MP kernel
Power off No Cannot power down system
Reboot Yes Reboot from OpenBSD works
Modem/other usb devices No Internal USB bus doesn’t seem to work

dmesg #

Full dmesg and other hardware info is available from PinePhone Pro installation report

,

Tim SerongThe wrong way to debug CrashLoopBackOff

Last week I had occasion to test deploying ceph-csi on a k3s cluster, so that Kubernetes workloads could access block storage provided by an external Ceph cluster. I went with the upstream Ceph documentation, because assuming everything worked it’d then be really easy for me to say to others “just go do this”.

Everything did not work.

I’d gone through all the instructions, inserting my own Ceph cluster’s FSID and MON IP addresses in the right places, applied the YAML to deploy the provisioner and node plugins, and all the provisioner bits were running just fine, but the csi-rbdplugin pods were stuck in CrashLoopBackOff:

> kubectl get pods
NAME                                        READY   STATUS             RESTARTS          AGE
csi-rbdplugin-22zjr                         1/3     CrashLoopBackOff   107 (3m55s ago)   2d
csi-rbdplugin-pbtc2                         1/3     CrashLoopBackOff   104 (3m33s ago)   2d
csi-rbdplugin-provisioner-9dcfd56d7-c8s72   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-provisioner-9dcfd56d7-hcztz   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-provisioner-9dcfd56d7-w2ctc   7/7     Running            28 (35m ago)      8d
csi-rbdplugin-r2rzr                         1/3     CrashLoopBackOff   106 (3m39s ago)   2d

The csi-rbdplugin pod consists of three containers – driver-registrar, csi-rbdplugin, liveness-prometheus – and csi-rbdplugin wasn’t able to load the rbd kernel module:

> kubectl logs csi-rbdplugin-22zjr --container csi-rbdplugin
I0726 10:25:12.862125    7628 cephcsi.go:199] Driver version: canary and Git version: d432421a88238a878a470d54cbf2c50f2e61cdda
I0726 10:25:12.862452    7628 cephcsi.go:231] Starting driver type: rbd with name: rbd.csi.ceph.com
I0726 10:25:12.865907    7628 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
E0726 10:25:12.872477    7628 rbd_util.go:303] modprobe failed (an error (exit status 1) occurred while running modprobe args: [rbd]): "modprobe: ERROR: could not insert 'rbd': Key was rejected by service\n"
F0726 10:25:12.872702    7628 driver.go:150] an error (exit status 1) occurred while running modprobe args: [rbd] 

Matching “modprobe: ERROR: could not insert ‘rbd’: Key was rejected by service” in the above was an error on each host’s console: “Loading of unsigned module is rejected”. These hosts all have secure boot enabled, so I figured it had to be something to do with that. So I logged into one of the hosts and ran modprobe rbd as root, but that worked just fine. No key errors, no unsigned module errors. And once I’d run modprobe rbd (and later modprobe nbd) on the host, the csi-rbdplugin container restarted and worked just fine.

So why wouldn’t modprobe work inside the container? /lib/modules from the host is mounted inside the container, the container has the right extra privileges… Clearly I needed to run a shell in the failing container to poke around inside when it was in CrashLoopBackOff state, but I realised I had no idea how to do that. I knew I could kubectl exec -it csi-rbdplugin-22zjr --container csi-rbdplugin -- /bin/bash but of course that only works if the container is actually running. My container wouldn’t even start because of that modprobe error.

Having previously spent a reasonable amount of time with podman, which has podman run, I wondered if there were a kubectl run that would let me start a new container using the upstream cephcsi image, but running a shell, instead of its default command. Happily, there is a kubectl run, so I tried it:

> kubectl run -it cephcsi --image=quay.io/cephcsi/cephcsi:canary --rm=true --command=true -- /bin/bash
If you don't see a command prompt, try pressing enter.
[root@cephcsi /]# modprobe rbd
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.14.21-150400.24.66-default
[root@cephcsi /]# ls /lib/modules/
[root@cephcsi /]#  

Ohhh, right, of course, that doesn’t have the host’s /lib/modules mounted. podman run lets me add volume mounts using -v options , so surely kubectl run will let me do that too.

At this point in the story, the notes I wrote last week include an awful lot of swearing.

See, kubectl run doesn’t have a -v option to add mounts, but what it does have is an --overrides option to let you add a chunk of JSON to override the generated pod. So I went back to the relevant YAML and teased out the bits I needed to come up with this monstrosity:

> kubectl run -it cephcsi-test \
  --image=quay.io/cephcsi/cephcsi:canary --rm=true \
  --overrides='{
    "apiVersion": "v1",
    "spec": {
      "containers": [ {
        "name": "cephcsi",
        "command": ["/bin/bash"],
        "stdin": true, "tty": true,
        "image": "quay.io/cephcsi/cephcsi:canary",
        "volumeMounts": [ {
          "mountPath": "/lib/modules", "name": "lib-modules" }],
        "securityContext": {
          "allowPrivilegeEscalation": true,
          "capabilities": { "add": [ "SYS_ADMIN" ] },
          "privileged": true }
      } ],
      "volumes": [ {
        "name": "lib-modules",
        "hostPath": { "path": "/lib/modules", "type": "" }
      } ]
    } }'

But at least I could get a shell and reproduce the problem:

> kubectl run -it cephcsi-test [honking great horrible chunk of JSON]
[root@cephcsi-test /]# ls /lib/modules/
5.14.21-150400.24.66-default
[root@cephcsi-test /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service

A certain amount more screwing around looking at the source for modprobe and bits of the kernel confirmed that the kernel really didn’t think the module was signed for some reason (mod_verify_sig() was returning -ENODATA), but I knew these modules were fine, because I could load them on the host. Eventually I hit on this:

[root@cephcsi-test /]# ls /lib/modules/*/kernel/drivers/block/rbd*
/lib/modules/5.14.21-150400.24.66-default/kernel/drivers/block/rbd.ko.zst

Wait, what’s that .zst extension? It turns out we (SUSE) have been shipping zstd-compressed kernel modules since – as best as I can tell – some time in 2021. modprobe on my SLE Micro 5.3 host of course supports this:

# grep PRETTY /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Micro for Rancher 5.3"
# modprobe --version
kmod version 29
+ZSTD +XZ +ZLIB +LIBCRYPTO -EXPERIMENTAL

modprobe in the CentOS Stream 8 upstream cephcsi container does not:

[root@cephcsi-test /]# grep PRETTY /etc/os-release 
PRETTY_NAME="CentOS Stream 8"
[root@cephcsi-test /]# modprobe --version
kmod version 25
+XZ +ZLIB +OPENSSL -EXPERIMENTAL

Mystery solved, but I have to say the error messages presented were spectacularly misleading. I later tried with secure boot disabled, and got something marginally better – in that case modprobe failed with “modprobe: ERROR: could not insert ‘rbd’: Exec format error”, and dmesg on the host gave me “Invalid ELF header magic: != \x7fELF”. If I’d seen messaging like that in the first place I might have been quicker to twig to the compression thing.

Anyway, the point of this post wasn’t to rant about inscrutable kernel errors, it was to rant about how there’s no way anyone could be reasonably expected to figure out how to do that --overrides thing with the JSON to debug a container stuck in CrashLoopBackOff. Assuming I couldn’t possibly be the first person to need to debug containers in this state, I told my story to some colleagues, a couple of whom said (approximately) “Oh, I edit the pod YAML and change the container’s command to tail -f /dev/null or sleep 1d. Then it starts up just fine and I can kubectl exec into it and mess around”. Those things totally work, and I wish I’d thought to do that myself. The best answer I got though was to use kubectl debug to make a copy of the existing pod but with the command changed. I didn’t even know kubectl debug existed, which I guess is my reward for not reading the entire manual 😉

So, finally, here’s the right way to do what I was trying to do:

> kubectl debug csi-rbdplugin-22zjr -it \
    --copy-to=csi-debug --container=csi-rbdplugin -- /bin/bash
[root@... /]# modprobe rbd
modprobe: ERROR: could not insert 'rbd': Key was rejected by service

(...do whatever other messing around you need to do, then...)

[root@... /]# exit
Session ended, resume using 'kubectl attach csi-debug -c csi-rbdplugin -i -t' command when the pod is running
> kubectl delete pod csi-debug
pod "csi-debug" deleted 

In the above kubectl debug invocation, csi-rbdplugin-22zjr is the existing pod that’s stuck in CrashLoopBackOff, csi-debug is the name of the new pod being created, and csi-rbdplugin is the container in that pod that has its command replaced with /bin/bash, so you can mess around inside it.

,

Tim RileyOpen source status update, October 2022–July 2023

It’s been a hot minute since my last open source status update! Let’s get caught up, and hopefully we can resume the monthly cadence from here.

Released Hanami 2.0

In Novemver we released Hanami 2.0.0! This was a huge milestone! Both for the Hanami project and the Ruby communuity, but also for us as a development team: we’d spent a long time in the wilderness.

All of this took some doing. It was a mad scramble to get here. The team and I worked non-stop over the preceding couple of months to get this release ready (including me during the mornings of a family trip to Perth).

Anyway, if you’ve followed me here for a while, most of the Hanami 2 features should hopefully feel familiar to you, but if you’d like a refresher, check out the Highlights of Hanami 2.0 that I wrote to accompany the release announcement.

Spoke at RubyConf Thailand

Just two weeks after the 2.0 release, I spoke at RubyConf Thailand 2022!

Given I was 100% focused on Hanami dev work until the release, this is probably the least amount of time I’ve had for conference talk preparation, but I was happy with the result. I found a good hook (“new framework, new you�, given the new year approaching) and put together a streamlined introduction to Hanami that fit within the ~20 minutes alotted to the talks (in this case, it was a boon that we hadn’t yet released our view or persistence layers 😆).

Check it out here:


Overhauled hanami-view internals and improved performance

With the 2.0 release done, we decided to release our view and persistence layers progressively, as 2.1 and 2.2 respectively. This would allow us to keep our focus on one thing at a time and improve the timeliness of the upcoming releases.

So over the Christmas break (including several nights on a family trip to the coast), I started work on the first big blocker for our view layer: hanami-view performance. We were slower than Rails, and that just doesn’t cut the mustard for a framework that advertises itself as fast and light.

Finding the right approach here took several goes, and it was finally ready for this pull request at the end of February. I managed to find a >2x performance boost while simplifying our internals, improving the ergonomics of Hanami::View::Context and our part and scope builders, and still retaining all existing features.

Spoke at RubyConf Australia

Also in February, I spoke at RubyConf Australia 2023! After a 3 year hiatus, this was a wonderful reunion for the Ruby community across Australia and New Zealand. It looked like we lost no appetite for these events, so I’m encouraged for next year and beyond.

To fit the homecoming theme, I brought a strong tinge of Australiana to my talk, and expanded it to include a preview of the upcoming view and persistence layers. Check it out:


Created Hanami::View::ERB, a new ERB engine

After performance, the next big issue for hanami-view was having our particular needs met by our template rendering engines, as well as making auto-escaping the default for our “first party supported� engines (ERB, Haml, Slim) that output HTML.

ERB support was an interesting combination of all these issues. For hanami-view, we don’t expect any rendering engine to require explicit capturing of block content. This is what allows methods on parts and scopes simply to yield and have the returned value match content provided to the block from within the template.

To support this with ERB, we previously had to require our users install and use the erbse gem, a little-used and incomplete ERB implementation that provided this implicit block capturing behaviour by default (but did not support auto-escaping of HTML-unsafe values). For a long while we also had to require users use hamlit-block for the same reasons, and as such we had to build a compatibility check between ourselves and Tilt to ensure the right engines were available. This arrangement was awkward and untenable for the kind of developer experience we want for Hanami 2.

So to fix all of this, I wrote our own ERB engine! This provides everything we need from ERB (implicit block capture as well as auto-escaping) and also allows for hanami-view to be used out of the box without requiring manual installation of other gems.

Meanwhile, in the years since my formative work on hanami-view (aka dry-view), Haml and Slim evolved to both use Temple and provide configuration hooks for all the behaviour we require, so this allowed me to drop our template engine compatibility checks and instead just automatically configure Haml or Slim to match our needs if they’re installed.

To support our auto-escaping of HTML-unsafe values, we’ve adopted the Object and String #html_safe? patches that are prevalent across relevant libraries in the Ruby ecosystem. This gives us the broadest possible compatibility, as well as a streamlined and unsurprising user experience. While you might see folks decry monkey patches in general, this is one example where it makes sense for Hanami to take a pragmatic approach, and I’m very pleased with the outcome.

Implemented helpers for hanami-view

After performance and rendering/HTML safety, the last remaining pre-release item for hanami-view was support for helpers. This needed a bit of thinking to sort out, since the new hanami-view provides a significantly different set of view abstractions compared to the 1.x edition.

Here’s how I managed to sort it out:

After this, all helpers should appear whereer you need them in your views, whether in templates, part classes or scope classes. Each slice will also generate a Views::Helpers module to serve as the starting point for your own collection of helpers, too.

With hanami-view providing parts and scopes, the idea is that you can and should use available-everywhere helpers less than before, but they can still be valuable from time to time, and with their introduction, now you have every possible option available for building your views.

Added friendly error pages

While focused on views, I also took the chance to make our error views friendly too. Now we:

Worked on integrating hanami-assets

Alongside all of this, Luca has been working hard on our support for front end assets via an esbuild plugin and its integration with the framework. This has been nothing short of heroic: he’s been beset by numerous roadblocks but overcome each one, and now we’re getting really close.

Back in June, Luca and I had our first ever pairing session on this work! We got a long way in just a couple of hours. I’m looking forward to pitching in with this as my next focus.

Prepared the Hanami 2.1.0.beta1 release

With all the views work largely squared away, I figured it was time to make a beta release and get this stuff out there for people to test, so we released it as 2.1.0.beta1 at the end of June.

Spoke at Brighton Ruby!

Also at the end of June I spoke at Brighton Ruby! I’ve wanted to attend this event for the longest time, and it did not disappoint. I had a wonderful day at the conference and enjoyed meeting a bunch of new Ruby friends.

For my talk I further evolved the content from the previous iterations, and this time included a look at how we might grow a Hanami app into a more real thing, as well as reflections on what Hanami 2’s release might mean for the Ruby community. I also experimented with a fun new theme and narrative device, which you shall be able to see once the video is out 😜

Thank you so much to Andy for the invitation and the support. ��

Took a holiday

After all of that, I took a break! You might’ve noticed my mentions of all the Hanami work I was doing while ostensibly on family trips. Well, after Brighton Ruby, I was all the way in Europe with the family, and made sure to have a good proper 4 weeks of (bonus summer) holiday. It was fanastic, and I didn’t look at Ruby code one bit.

What’s next

Now that I’m back, I’ll focus on doing whatever is necessary to complete our front end assets integration and get that out as a 2.1 beta2 release. Our new assets stuff is the completely new, so some time for testing and bug fixing will be useful.

Over the rest of the beta period I hope to complete a few smaller general framework improvements and fixes, and from there we can head towards 2.1.0 final.

I suspect it will take at least one more OSS status updates before that all happens, so I can check in with you about it all then!

,

FLOSS Down Under - online free software meetingsJuly 2023 Meeting

Meeting Report

The July 2023 meeting sparked multiple new topics including Linux security architecture, Debian ports of LoongArch and Risc-V as well as hardware design of PinePhone backplates.

On the practical side, Russell Coker demonstrated running different applications in isolated environment with bubblewrap sandbox, as well as other hardening techniques and the way they interact with the host system. Russell also discussed some possible pathways of hardening desktop Linux to reach the security level of modern Android. Yifei Zhan demonstrated sending and receiving messages with the PineDio USB LoRa adapter and how to inspect LoRa signal with off-the-shelf software defined radio receiver, and discussed how the driver situation for LoRa on Linux might be improved. Yifei then gave a demonstration on utilizing KVM on PinePhone Pro to run NetBSD and OpenBSD virtual machines, more details on running VMs on the PinePhone Pro can be found on this blog post from Yifei.

We also had some discussion of the current state of Mobian and Debian ecosystem, along with how to contribute to different parts of Mobian with a Mobian developer who joined us.

,

Stewart SmithGetting your photos out of Shotwell

Somewhat a while ago now, I wrote about how every time I return to write some software for the Mac, the preferred language has changed. The purpose of this adventure was to get my photos out of the aging Shotwell and onto my (then new) Mac and the Apple Photos App.

I’ve had a pretty varied experience with photo management on Linux over the past couple of decades. For a while I used f-spot as it was the new hotness. At some point this became…. slow and crashy enough that it was unusable. Today, it appears that the GitHub project warns that current bugs include “Not starting”.

At some point (and via a method I have long since forgotten), I did manage to finally get my photos over to Shotwell, which was the new hotness at the time. That data migration was so long ago now I actually forget what features I was missing from f-spot that I was grumbling about. I remember the import being annoying though. At some point in time Shotwell was no longer was the new hotness and now there is GNOME Photos. I remember looking at GNOME Photos, and seeing no method of importing photos from Shotwell, so put it aside. Hopefully that situation has improved somewhere.

At some point Shotwell was becoming rather stagnated, and I noticed more things stopping to work rather than getting added features and performance. The good news is that there has been some more development activity on Shotwell, so hopefully my issues with it end up being resolved.

One recommendation for Linux photo management was digiKam, and one that I never ended up using full time. One of the reasons behind that was that I couldn’t really see any non manual way to import photos from Shotwell into it.

With tens of thousands of photos (~58k at the time of writing), doing things manually didn’t seem like much fun at all.

As I postponed my decision, I ended up moving my main machine over to a Mac for a variety of random reasons, and one quite motivating thing was the ability to have Photos from my iPhone magically sync over to my photo library without having to plug it into my computer and copy things across.

So…. how to get photos across from Shotwell on Linux to Photos on a Mac/iPhone (and also keep a very keen eye on how to do it the other way around, because, well, vendor lock-in isn’t great).

It would be kind of neat if I could just run Shotwell on the Mac and have some kind of import button, but seeing as there wasn’t already a native Mac port, and that Shotwell is written in Vala rather than something I know has a working toolchain on macOS…. this seemed like more work than I’d really like to take on.

Luckily, I remembered that Shotwell’s database is actually just a SQLite database pointing to all the files on disk. So, if I could work out how to read it accurately, and how to import all the relevant metadata (such as what Albums a photo is in, tags, title, and description) into Apple Photos, I’d be able to make it work.

So… is there any useful documentation as to how the database is structured?

Semi annoyingly, Shotwell is written in Vala, a rather niche programming language that while integrating with all the GObject stuff that GNOME uses, is largely unheard of. Luckily, the database code in Shotwell isn’t too hard to read, so was a useful fallback for when the documentation proves inadequate.

So, I armed myself with the following resources:

Programming the Mac side of things, it was a good excuse to start looking at Swift, so knowing I’d also need to read a SQLite database directly (rather than use any higher level abstraction), I armed myself with the following resources:

From here, I could work on getting the first half going, the ability to view my Shotwell database on the Mac (which is what I posted a screenshot of back in Feb 2022).

But also, I had to work out what I was doing on the other end of things, how would I import photos? It turns out there’s an API!

A bit of SwiftUI code:

import SwiftUI
import AppKit
import Photos

struct ContentView: View {
    @State var favorite_checked : Bool = false
    @State var hidden_checked : Bool = false
    var body: some View {
        VStack() {
            Text("Select a photo for import")
            Toggle("Favorite", isOn: $favorite_checked)
            Toggle("Hidden", isOn: $hidden_checked)
            Button("Import Photo")
            {
                let panel = NSOpenPanel()
                panel.allowsMultipleSelection = false
                panel.canChooseDirectories = false
                if panel.runModal() == .OK {
                    let photo_url = panel.url!
                    print("selected: " + String(photo_url.absoluteString))
                    addAsset(url: photo_url, isFavorite: favorite_checked, isHidden: hidden_checked)
                }
            }
            .padding()
        }
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
    }
}

Combined with a bit of code to do the import (which does look a bunch like the examples in the docs):

import SwiftUI
import Photos
import AppKit

@main
struct SinglePhotoImporterApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

func addAsset(url: URL, isFavorite: Bool, isHidden: Bool) {
    // Add the asset to the photo library.
    let path = "/Users/stewart/Pictures/1970/01/01/1415446258647.jpg"
    let url = URL(fileURLWithPath: path)
    PHPhotoLibrary.shared().performChanges({
        let addedImage = PHAssetChangeRequest.creationRequestForAssetFromImage(atFileURL: url)
        addedImage?.isHidden = isHidden
        addedImage?.isFavorite = isFavorite
    }, completionHandler: {success, error in
        if !success { print("Error creating the asset: \(String(describing: error))") } else
        {
            print("Imported!")
        }
    })
}

This all meant I could import a single photo. However, there were some limitations.

There’s the PHAssetCollectionChangeRequest to do things to Albums, so it would solve that problem, but I couldn’t for the life of me work out how to add/edit Titles and Descriptions.

It was so close!

So what did I need to do in order to import Titles and Descriptions? It turns out you can do that via AppleScript. Yes, that thing that launched in 1993 and has somehow survived the transition of m68k based Macs to PowerPC based Macs to Intel based Macs to ARM based Macs.

The Photos dictionary for AppleScript

So, just to make it easier to debug what was going on, I started adding code to my ShotwellImporter tool that would generate snippets of AppleScript I could run and check that it was doing the right thing…. but then very quickly ran into a problem…. it appears that the AppleScript language interpreter on modern macOS has limits that you’d be more familiar with in 1993 than 2023, and I very quickly hit limits where the script would just error out before running (I was out of dictionary size allegedly).

But there’s a new option! Everything you can do with AppleScript you can now do with JavaScript – it’s just even less documented than AppleScript is! But it does work! I got to the point where I could generate JavaScript that imported photos, into all the relevant albums, and set title and descriptions.

A useful write up of using JavaScript rather than AppleScript to do things with Photos: https://mudge.name/2019/11/13/scripting-photos-for-macos-with-javascript/

More recent than when I was doing my hacking, https://alexwlchan.net/2023/managing-albums-in-photos/ is a good read.

With luck I’ll find some time to write up a bit of a walkthrough of my code, and push it up somewhere.

,

yifeiVirtualization with KVM on the PinePhone Pro

Basic Setup #

All the tools we need for running VM are already packaged on Mobian, to install them, run:

sudo apt install virt-manager

then add your user to the libvirt group:

sudo adduser mobian libvirt

Reboot and then run virt-host-validate, it should indicate /dev/kvm exists and is accessible.

Trouble with Heterogeneous Architecture #

Trying to start qemu-system-aarch64 with -enable-kvm flag can yield the following, rather unhelpfully worded error:

qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument

Turns out the RK3399s SoC used on this device is built around Arm’s heterogeneous big.Little architecture, and contains 4 slower Cortex A53 cores and 2 faster Cortex A72 cores, this allows the kernel to dynamically schedule tasks on different types of cores to improve performance and save energy. However, this configuration is not yet supported by KVM, and when the expected CPU type differs from the scheduled type (e.g. expecting A72 but kernel scheduled a process on an A53 core), it will panic.

Before KVM is able to work with this setup, we can workaround it by manually set the CPU affinity of qemu by launching it with taskset. To only use A72 cores:

taskset -c 4,5 qemu-system-aarch64 <qemu options>

To only use the slower A53 cores:

taskset -c 0,1,2,3 qemu-system-aarch64 <qemu options>

To apply this workaround globally, we need a wrapper.

dpkg-divert #

Simply replacing the qemu-system-aarch64 binary with a wrapper is not a great idea because upstream Debian package can override our warpper when upgrading qemu. To ensure Debian will not override it, we can divert package’s version of the binary to another location with dpkg-divert:

sudo dpkg-divert --rename /usr/bin/qemu-system-aarch64

The --rename option ensures the existing binary will be moved to a new name, which by default is qemu-system-aarch64.distrib. Finally, create the wrapper under usr/bin/qemu-system-aarch64 (I decide to only use faster cores, A53 cores are too slow for most workload):

#!/usr/bin/env sh
taskset -c 4,5 /usr/bin/qemu-system-aarch64.distrib "$@"

Launching VM #

The following scripts will launch VM of different BSD OS, doing the same for Linux distros is similar. I’m using [user networking (SLIRP)][1] as network backend which does not require root privileges. This backend has the drawback of lower performance compare to TAP or VDE, but still fast enough for me.

OpenBSD #

Setup

mkdir openbsd.vm
cd openbsd.vm
# create disk image
qemu-img create -f qcow2 openbsd.vm.qcow2 32G
# use arm64 uefi firmware from package qemu-efi-aarch64
cp /usr/share/AAVMF/AAVMF_CODE.fd ./

Boot to installer

Assume using miniroot73.img as installer, -smp is needed for installer to enable MP kernel.

qemu-system-aarch64 \
        -enable-kvm \
        -m 1024 \
        -cpu host -M virt \
        -nographic \
        -drive if=pflash,file=aavmf_code.fd,format=raw \
        -drive if=virtio,file=miniroot73.img,format=raw \
        -drive if=virtio,file=openbsd.vm.qcow2,format=qcow2 \
        -netdev user,id=obsd \
        -device virtio-net,netdev=obsd \
        -smp 2

Launch VM

qemu-system-aarch64 \
        -enable-kvm \
        -m 1024 \
        -cpu host -M virt \
        -nographic \
        -drive if=pflash,file=aavmf_code.fd,format=raw \
        -drive if=virtio,file=openbsd.vm.qcow2,format=qcow2 \
        -netdev user,id=obsd \
        -device virtio-net,netdev=obsd \
        -smp 2

NetBSD #

Setup

mkdir netbsd.vm
cd netbsd.vm
# use arm64 uefi firmware from package qemu-efi-aarch64
cp /usr/share/AAVMF/AAVMF_CODE.fd ./

Launch VM

NetBSD provides ready to boot image for arm64, the daily snapshot is available at:

https://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/latest/evbarm-aarch64/binary/gzimg/arm64mbr.img.gz
qemu-system-aarch64 \
        -enable-kvm \
        -m 1024 \
        -cpu host -M virt \
        -nographic \
        -drive if=pflash,file=aavmf_code.fd,format=raw \
        -drive if=virtio,file=arm64mbr.img,format=raw \
        -netdev user,id=nbsd \
        -device virtio-net,netdev=nbsd \
        -smp 2

virt-manager and arm64 UEFI secure boot #

Virt-manager seems to use secure boot enabled firmware by default when creating new VM, this might not work for your prefered system (It certainly does not work with OpenBSD) and will yield a Script Error Status: Access Denied error for unsupported install media. To disable secure boot, select Customize configuration before install during the last step of creating new VM, go to Overview section, and change the firmware from AAVMF_CODE.ms.fd to UEFI aarch64: /usr/share/AAVMF/AAVMF_CODE.fd. This cannot be changed easily after VM is created.

,

Tim SerongLonghorn in a Sandbox

In my last post, I wrote about how I taught sesdev (originally a tool for deploying Ceph clusters on virtual machines) to deploy k3s, because I wanted a little sandbox in which I could break learn more about Kubernetes. It’s nice to be able to do a toy deployment locally, on a bunch of VMs, on my own hardware, in my home office, rather than paying to do it on someone else’s computer. Given the k3s thing worked, I figured the next step was to teach sesdev how to deploy Longhorn so I could break that learn more about that too.

Teaching sesdev to deploy Longhorn meant asking it to:

  • Install nfs-client, open-iscsi and e2fsprogs packages on all nodes.
  • Make an ext4 filesystem on /dev/vdb on all the nodes that have extra disks, then mount that on /var/lib/longhorn.
  • Use kubectl label node -l 'node-role.kubernetes.io/master!=true' node.longhorn.io/create-default-disk=true to ensure Longhorn does its storage thing only on the nodes that aren’t the k3s master.
  • Install Longhorn with Helm, because that will install the latest version by default vs. using kubectl where you always explicitly need to specify the version.
  • Create an ingress so the UI is exposed… from all nodes, via HTTP, with no authentication. Remember: this is a sandbox – please don’t do this sort of thing in production!

So, now I can do this:

> sesdev create k3s --deploy-longhorn
=== Creating deployment "k3s-longhorn" with the following configuration === 

Deployment-wide parameters (applicable to all VMs in deployment):

- deployment ID:    k3s-longhorn
- number of VMs:    5
- version:          k3s
- OS:               tumbleweed
- public network:   10.20.78.0/24 

Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y

=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider…
Bringing machine 'node1' up with 'libvirt' provider…
Bringing machine 'node2' up with 'libvirt' provider…
Bringing machine 'node3' up with 'libvirt' provider…
Bringing machine 'node4' up with 'libvirt' provider…

[... lots more log noise here - this takes several minutes... ]

=== Deployment Finished ===

You can login into the cluster with:

  $ sesdev ssh k3s-longhorn

Longhorn will now be deploying, which may take some time.
After logging into the cluster, try these:

  # kubectl get pods -n longhorn-system --watch
  # kubectl get pods -n longhorn-system

The Longhorn UI will be accessible via any cluster IP address
(see the kubectl -n longhorn-system get ingress output above).
Note that no authentication is required.

…and, after another minute or two, I can access the Longhorn UI and try creating some volumes. There’s a brief period while the UI pod is still starting where it just says “404 page not found”, and later after the UI is up, there’s still other pods coming online, so on the Volume screen in the Longhorn UI an error appears: “failed to get the parameters: failed to get target node ID: cannot find a node that is ready and has the default engine image longhornio/longhorn-engine:v1.4.1 deployed“. Rest assured this goes away in due course (it’s not impossible I’m suffering here from rural Tasmanian internet lag pulling container images). Anyway, with my five nodes – four of which have an 8GB virtual disk for use by Longhorn – I end up with a bit less than 22GB storage available:

21.5 GiB isn’t much, but remember this is a toy deployment running in VMs on my desktop Linux box

Now for the fun part. Longhorn is a distributed storage solution, so I thought it would be interesting to see how it handled a couple of types of failure. The following tests are somewhat arbitrary (I’m really just kicking the tyres randomly at this stage) but Longhorn did, I think, behave pretty well given what I did to it.

Volumes in Longhorn consist of replicas stored as sparse files on a regular filesystem on each storage node. The Longhorn documentation recommends using a dedicated disk rather than just having /var/lib/longhorn backed by the root filesystem, so that’s what sesdev does: /var/lib/longhorn is an ext4 filesystem mounted on /dev/vdb. Now, what happens to Longhorn if that underlying block device suffers some kind of horrible failure? To test that, I used the Longhorn UI to create a 2GB volume, then attached that to the master node:

The Longhorn UI helpfully tells me the volume replicas are on node3, node4 and node1

Then, I ssh’d to the master node and with my 2GB Longhorn volume attached, made a filesystem on it and created a little file:

> sesdev ssh k3s-longhorn
Have a lot of fun...

master:~ # cat /proc/partitions 
major minor  #blocks  name 
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
   8        0    2097152 sda

master:~ # mkfs /dev/sda
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 3709b21c-b9a2-41c1-a6dd-e449bdeb275b
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912
Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done 

master:~ # mount /dev/sda /mnt
master:~ # echo foo > /mnt/foo
master:~ # cat /mnt/foo
foo

Then I went and trashed the block device backing one of the replicas:

> sesdev ssh k3s-longhorn node3
Have a lot of fun...

node3:~ # ls /var/lib/longhorn
engine-binaries  longhorn-disk.cfg  lost+found  replicas  unix-domain-socket

node3:~ # dd if=/dev/urandom of=/dev/vdb bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.486205 s, 216 MB/s

node3:~ # ls /var/lib/longhorn

node3:~ # dmesg|tail -n1
[ 6544.197183] EXT4-fs error (device vdb): ext4_map_blocks:607: inode #393220: block 1607168: comm longhorn: lblock 0 mapped to illegal pblock 1607168 (length 1) 

At this point, the Longhorn UI still showed the volume as green (healthy, ready, scheduled). Then, back on the master node, I tried creating another file:

master:~ # echo bar > /mnt/bar
master:~ # cat /mnt/bar
bar

That’s fine so far, but suddenly the Longhorn UI noticed that something very bad had happened:

The volume is still usable, but one of the replicas has failed

Ultimately node3 was rebooted and ended up stalled with the console requesting the root password for maintenance:

Failed to mount /var/lib/longhorn – Can’t find ext4 filesystem

Meanwhile, Longhorn went and rebuilt a third replica on node2:

All green again!

…and the volume remained usable the entire time:

master:~ # echo baz > /mnt/baz
master:~ # ls /mnt
bar  baz  foo  lost+found

That’s perfect!

Looking at the Node screen we could see that node3 was still down:

There may be disk size errors with down nodes (4.87 TiB looks a lot like integer overflow to me)

That’s OK, I was able to fix node3. I logged in on the console and ran mkfs.ext4 /dev/vdb then brought the node back up again.The disk remained unschedulable, because Longhorn was still expecting the ‘old’ disk to be there (I assume based on the UUID stored in /var/lib/longhorn/longhorn-disk.cfg) and of course the ‘new’ disk is empty. So I used the Longhorn UI to disable scheduling for that ‘old’ disk, then deleted it. Shortly after, Longhorn recognised the ‘new’ disk mounted at /var/lib/longhorn and everything was back to green across the board.

So Longhorn recovered well from the backing store of one replica going bad. Next I thought I’d try to break it from the other end by running a volume out of space. What follows is possibly not a fair test, because what I did was create a single Longhorn volume larger than the underlying disks, then filled that up. In normal usage, I assume one would ensure there’s plenty of backing storage available to service multiple volumes, that individual volumes wouldn’t generally be expected to get more than a certain percentage full, and that some sort of monitoring and/or alerting would be in place to warn of disk pressure.

With four nodes, each with a single 8GB disk, and Longhorn apparently reserving 2.33GB by default on each disk, that means no Longhorn volume can physically store more than a bit over 5.5GB of data (see the Size column in the previous screenshot). Given that the default setting for Storage Over Provisioning Percentage is 200, we’re actually allowed to allocate up to a bit under 11GB.

So I went and created a 10GB volume, attached that to the master node, created a filesystem on it, and wrote a whole lot of zeros to it:

master:~ # mkfs.ext4 /dev/sda
mke2fs 1.46.5 (30-Dec-2021)
[...]

master:~ # mount /dev/sda /mnt
master:~ # df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda        9.8G   24K  9.3G   1% /mnt

master:~ # dd if=/dev/zero of=/mnt/big-lot-of-zeros bs=1M status=progress
2357198848 bytes (2.4 GB, 2.2 GiB) copied, 107 s, 22.0 MB/s

While that dd was running, I was able to see the used space of the replicas increasing in the Longhorn UI:

Those little green bars eventually turn yellow as the disks approach full

After a few more minutes, the dd stalled…

master:~ # dd if=/dev/zero of=/mnt/big-lot-of-zeros bs=1M status=progress
9039773696 bytes (9.0 GB, 8.4 GiB) copied, 478 s, 18.9 MB/s

…there was a lot of unpleasantness on the master node’s console…

So many I/O errors!

…the replicas became unschedulable due to lack of space…

This doesn’t look good

…and finally the volume faulted:

This really doesn’t look good

Now what?

It turns out that Longhorn will actually recover if we’re able to somehow expand the disks that store the replicas. This is probably a good argument for backing Longhorn with an LVM volume on each node in real world deployments, because then you could just add another disk and extend the volume onto it. In my case though, given it’s all VMs and virtual block devices, I can actually just enlarge those devices. For each node then, I:

  1. Shut it down
  2. Ran qemu-img resize /var/lib/libvirt/images/k3s-longhorn_$NODE-vdb.qcow2 +8G
  3. Started it back up again and ran resize2fs /dev/vdb to take advantage of the extra disk space.

After doing that to node1, Longhorn realised there was enough space there and brought node1’s replica of my 10GB volume back online. It also summarily discarded the other two replicas from the still-full disks on node2 and node3, which didn’t yet have enough free space to be useful:

One usable replica is better than three unusable replicas

As I repeated the virtual disk expansion on the other nodes, Longhorn happily went off and recreated the missing replicas:

Finally I could re-attach the volume to the master node, and have a look to see how many of my zeros were actually written to the volume:

master:~ # cat /proc/partitions 
major minor  #blocks  name
 254        0   44040192 vda
 254        1       2048 vda1
 254        2      20480 vda2
 254        3   44016623 vda3
   8        0   10485760 sda

master:~ # mount /dev/sda /mnt
master:~ # ls -l /mnt
total 7839764
-rw-r--r-- 1 root root 8027897856 May  3 04:41 big-lot-of-zeros
drwx------ 2 root root      16384 May  3 04:34 lost+found

Recall that dd claimed to have written 9039773696 bytes before it stalled when the volume faulted, so I guess that last gigabyte of zeros is lost in the aether. But, recall also that this isn’t really a fair test – one overprovisioned volume deliberately being quickly and deliberately filled to breaking point vs. a production deployment with (presumably) multiple volumes that don’t fill quite so fast, and where one is hopefully paying at least a little bit of attention to disk pressure as time goes by.

It’s worth noting that in a situation where there are multiple Longhorn volumes, assuming one disk or LVM volume per node, the replicas will all share the same underlying disks, and once those disks are full it seems all the Longhorn volumes backed by them will fault. Given multiple Longhorn volumes, one solution – rather than expanding the underlying disks – is simply to delete a volume or two if you can stand to lose the data, or maybe delete some snapshots (I didn’t try the latter yet). Once there’s enough free space, the remaining volumes will come back online. If you’re really worried about this failure mode, you could always just disable overprovisioning in the first place – whether this makes sense or not will really depend on your workloads and their data usage patterns.

All in all, like I said earlier, I think Longhorn behaved pretty well given what I did to it. Some more information in the event log could perhaps be beneficial though. In the UI I can see warnings from longhorn-node-controller e.g. “the disk default-disk-1cdbc4e904539d26(/var/lib/longhorn/) on the node node1 has 3879731200 available, but requires reserved 2505089433, minimal 25% to schedule more replicas” and warnings from longhorn-engine-controller e.g. “Detected replica overprovisioned-r-73d18ad6 (10.42.3.19:10000) in error“, but I couldn’t find anything really obvious like “Dude, your disks are totally full!”

Later, I found more detail in the engine manager logs after generating a support bundle ([…] level=error msg=”I/O error” error=”tcp://10.42.4.34:10000: write /host/var/lib/longhorn/replicas/overprovisioned-c3b9b547/volume-head-003.img: no space left on device”) so the error information is available – maybe it’s just a matter of learning where to look for it.

,

Lev LafayetteCOMP90024: Cluster and Cloud Computing For 2023

For the past few years, I have delivered some guest lectures and training for the University of Melbourne master's level course Cluster and Cloud Computing. This year's contribution has been expanded, which is not surprising as the course is apparently required for data science students as well as computer science students. Thus, for 2023 four presentations were given, with the workshop repeated three times! The first two presentations were an introduction to the Linux command line, followed by slightly more advanced content which included an introduction to shell scripting. The third lecture was the main presentation on Supercomputing and the Spartan HPC system in particular. The third presentation was a workshop on HPC job submission and and introduction to OpenMP and MPI programming with a concentration on using MPI4Py.

,

Tim SerongTeaching an odd dog new tricks

We – that is to say the storage team at SUSE – have a tool we’ve been using for the past few years to help with development and testing of Ceph on SUSE Linux. It’s called sesdev because it was created largely for SES (SUSE Enterprise Storage) development. It’s essentially a wrapper around vagrant and libvirt that will spin up clusters of VMs running openSUSE or SLES, then deploy Ceph on them. You would never use such clusters in production, but it’s really nice to be able to easily spin up a cluster for testing purposes that behaves something like a real cluster would, then throw it away when you’re done.

I’ve recently been trying to spend more time playing with Kubernetes, which means I wanted to be able to spin up clusters of VMs running openSUSE or SLES, then deploy Kubernetes on them, then throw the clusters away when I was done, or when I broke something horribly and wanted to start over. Yes, I know there’s a bunch of other tools for doing toy Kubernetes deployments (minikube comes to mind), but given I already had sesdev and was pretty familiar with it, I thought it’d be worthwhile seeing if I could teach it to deploy k3s, a particularly lightweight version of Kubernetes. Turns out that wasn’t too difficult, so now I can do this:

> sesdev create k3s
=== Creating deployment "k3s" with the following configuration === 
Deployment-wide parameters (applicable to all VMs in deployment):
deployment ID:    k3s
number of VMs:    5
version:          k3s
OS:               tumbleweed
public network:   10.20.190.0/24 
Proceed with deployment (y=yes, n=no, d=show details) ? [y]: y
=== Running shell command ===
vagrant up --no-destroy-on-error --provision
Bringing machine 'master' up with 'libvirt' provider...
Bringing machine 'node1' up with 'libvirt' provider...
Bringing machine 'node2' up with 'libvirt' provider...
Bringing machine 'node3' up with 'libvirt' provider...
Bringing machine 'node4' up with 'libvirt' provider...

[...
  wait a few minutes
  (there's lots more log information output here in real life)
...]

=== Deployment Finished ===
 You can login into the cluster with:
 $ sesdev ssh k3s

…and then I can do this:

> sesdev ssh k3s
Last login: Fri Mar 24 11:50:15 CET 2023 from 10.20.190.204 on ssh
Have a lot of fun…

master:~ # kubectl get nodes
NAME     STATUS   ROLES                  AGE     VERSION
master   Ready    control-plane,master   5m16s   v1.25.7+k3s1
node2    Ready                     2m17s   v1.25.7+k3s1
node1    Ready                     2m15s   v1.25.7+k3s1
node3    Ready                     2m16s   v1.25.7+k3s1
node4    Ready                     2m16s   v1.25.7+k3s1 

master:~ # kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-79f67d76f8-rpj4d   1/1     Running     0          5m9s
kube-system   metrics-server-5f9f776df5-rsqhb           1/1     Running     0          5m9s
kube-system   coredns-597584b69b-xh4p7                  1/1     Running     0          5m9s
kube-system   helm-install-traefik-crd-zz2ld            0/1     Completed   0          5m10s
kube-system   helm-install-traefik-ckdsr                0/1     Completed   1          5m10s
kube-system   svclb-traefik-952808e4-5txd7              2/2     Running     0          3m55s
kube-system   traefik-66c46d954f-pgnv8                  1/1     Running     0          3m55s
kube-system   svclb-traefik-952808e4-dkkp6              2/2     Running     0          2m25s
kube-system   svclb-traefik-952808e4-7wk6l              2/2     Running     0          2m13s
kube-system   svclb-traefik-952808e4-chmbx              2/2     Running     0          2m14s
kube-system   svclb-traefik-952808e4-k7hrw              2/2     Running     0          2m14s

…and then I can make a mess with kubectl apply, helm, etc.

One thing that sesdev knows how to do is deploy VMs with extra virtual disks. This functionality is there for Ceph deployments, but there’s no reason we can’t turn it on when deploying k3s:

> sesdev create k3s --num-disks=2
> sesdev ssh k3s
master:~ # for node in \
    $(kubectl get nodes -o 'jsonpath={.items[*].metadata.name}') ;
    do echo $node ; ssh $node cat /proc/partitions ; done
master
major minor  #blocks  name
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
node3
major minor  #blocks  name
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
 253       16    8388608 vdb
 253       32    8388608 vdc
node2
 major minor  #blocks  name
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
 253       16    8388608 vdb
 253       32    8388608 vdc
node4
 major minor  #blocks  name
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
 253       16    8388608 vdb
 253       32    8388608 vdc
node1
 major minor  #blocks  name
 253        0   44040192 vda
 253        1       2048 vda1
 253        2      20480 vda2
 253        3   44016623 vda3
 253       16    8388608 vdb
 253       32    8388608 vdc

As you can see this gives all the worker nodes an extra two 8GB virtual disks. I suspect this may make sesdev an interesting tool for testing other Kubernetes based storage systems such as Longhorn, but I haven’t tried that yet.

,

Paul WayperThe Energica Experia

I recently bought an Energica Experia - the latest, largest and longest distance of Energica's electric motorbike models.

The decision to do this rather than build my own was complicated, and I'm going to mostly skip over the detail of that. At some time I might put it in another blog post. But for now it's enough to say that I'd accidentally cooked the motor in my Mark I, the work on the Mark II was going to take ages, and I was in the relatively fortunate situation of being able to afford the Experia if I sold my existing Triumph Tiger Sport and the parts for the Mark II.

For other complicated reasons I was planning to be in Sydney after the weekend that Bruce at Zen Motorcycles told me the bike would be arriving. Rather than have it freighted down, and since I would have room for my riding gear in our car, I decided to pick it up and ride it back on the Monday. In reconnoitering the route, we discovered that by pure coincidence Zen Motorcycles is on Euston Road in Alexandria, only 200 metres away from the entrance to WestConnex and the M8. So with one traffic light I could be out of Sydney.

I will admit to being more than a little excited that morning. Electric vehicles are still, in 2023, a rare enough commodity that waiting lists can be months long; I ordered this bike in October 2022 and it arrived in March 2023. So I'd had plenty of time to build my expectations. And likewise the thought of riding a brand new bike - literally one of the first of its kind in the country (it is the thirty-second Experia ever made!) - was a little daunting. I obtained PDF copies of the manual and familiarised myself with turning the cruise control on and off, as well as checking and setting the regen braking levels. Didn't want to stuff anything up on the way home.

There is that weird feeling in those situations of things being both very ordinary and completely unique. I met Bruce, we chatted, I saw the other Experia models in the store, met Ed - who had come down to chat with Bruce, and just happened to be the guy who rode a Harley Davidson Livewire from Perth to Sydney and then from Sydney to Cape Tribulation and back. He shared stories from his trip and tips on hypermiling. I signed paperwork, picked up the keys, put on my gear, prepared myself.

Even now I still get a bit choked up just thinking of that moment. Seeing that bike there, physically real, in front of me - after those months of anticipation - made the excitement real as well.

So finally, after making sure I wasn't floating, and making sure I had my ear plugs in and helmet on the right way round, I got on. Felt the bike's weight. Turned it on. Prepared myself. Took off. My partner followed behind, through the lights, onto the M8 toward Canberra. I gave her the thumbs up.

We planned to stop for lunch at Mittagong, while the NRMA still offers the free charger at the RSL there. One lady was charging her Nissan Leaf on the ChaDeMo side; shortly after I plugged in a guy arrived in his Volvo XC40 Recharge. He had the bigger battery and would take longer; I just needed a ten minute top up to get me to Marulan.

I got to Marulan and plugged in; a guy came thinking he needed to tell the petrol motorbike not to park in the electric vehicle bay, but then realised that the plug was going into my bike. Kate headed off, having charged up as well, and I waited another ten minutes or so to get a bit more charge. Then I rode back.

I stopped, only once more - at Mac's Reef Road. I turned off and did a U turn, then waited for the traffic to clear before trying the bike's acceleration. Believe me when I say this bike will absolutely do a 0-100km/hr in under four seconds! It is not a light bike, but when you pull on the power it gets up and goes.

Here is my basic review, given that experience and then having ridden it for about ten weeks around town.

The absolute best feature of the Energica Experia is that it is perfectly comfortable riding around town. Ease on the throttle and it gently takes off at the traffic lights and keeps pace with the traffic. Ease off, and it gently comes to rest with regenerative braking and a light touch on the rear brake after stopping to hold it still. If you want to take off faster, wind the throttle on more. It is not temperamental or twitchy, and you have no annoying gears and clutch to balance.

In fact, I feel much more confident lane filtering, because before I would have to have the clutch ready and be prepared to give the Tiger Sport lots of throttle lest I accidentally stall it in front of an irate line of traffic. With the Experia, I can simply wait peacefully - using no power - and then when the light goes green I simply twist on the throttle and I am away ahead of even the most aggressive car driver.

It is amazingly empowering.

I'm not going to bore you with the stats - you can probably look them up yourself if you care. The main thing to me is that it has DC fast charging, and watching 75KW go into a 22.5KWHr battery is just a little bit terrifying as well as incredibly cool. The stated range of 250km on a charge at highway speeds is absolutely correct, from my experience riding it down from Sydney. And that plus the fast charging means that I think it is going to be quite reasonable to tour on this bike, stopping off at fast or even mid-level chargers - even a boring 22KW charger can fill the battery up in an hour. The touring group I travel with stops often enough that if those stops can be top ups, I will not hold anyone up.

Some time in the near future I hope to have a nice fine day where I can take it out on the Cotter Loop. This is an 80km stretch of road that goes west of Canberra into the foothills of the Brindabella Ranges, out past the Deep Space Tracking Station and Tidbinbilla Nature Reserve. It's a great combination of curving country roads and hilly terrain, and reasonably well maintained as well. I did that on the Tiger Sport, with a GoPro, before I sold it - and if I can ever convince PiTiVi to actually compile the video from it I will put that hour's ride up on a platform somewhere.

I want to do that as much to show off Canberra's scenery as to show off the bike.

And if the CATL battery capacity improvement comes through to the rest of the industry, and we get bikes that can do 400km to 500km on a charge, then electric motorbike touring really will be no different to petrol motorbike touring. The Experia is definitely at the forefront of that change, but it is definitely possible on this bike.

,

Robert CollinsRustup CI / test suite performance

Rustup (the community package manage for the Rust language) was starting to really suffer : CI times were up at ~ one hour.

We’ve made some strides in bringing this down.

Caching factory for test scenarios

The first thing, which achieved about a 30% reduction in test time was to stop recreating all the test context every time.

Rustup tests the download/installation/upgrade of distributions of Rust. To avoid downloading gigabytes in the test suite, the suite creates mocks of the published Rust artifacts. These mocks are GPG signed and compressed with multiple compression methods, both of which are quite heavyweight operations to perform – and not actually the interesting code under test to execute.

Previously, every test was entirely hermetic, and usually the server state was also unmodified.

There were two cases where the state was modified. One, a small number of tests testing error conditions such as GPG signature failures. And two, quite a number of tests that were testing temporal behaviour: for instance, install nightly at time A, then with a newer server state, perform a rustup update and check a new version is downloaded and installed.

We’re partway through this migration, but compare these two tests:

fn check_updates_some() {
    check_update_setup(&|config| {
        set_current_dist_date(config, "2015-01-01");
        config.expect_ok(&["rustup", "update", "stable"]);
        config.expect_ok(&["rustup", "update", "beta"]);
        config.expect_ok(&["rustup", "update", "nightly"]);
        set_current_dist_date(config, "2015-01-02");
        config.expect_stdout_ok(
            &["rustup", "check"],
            for_host!(
                r"stable-{0} - Update available : 1.0.0 (hash-stable-1.0.0) -> 1.1.0 (hash-stable-1.1.0)
beta-{0} - Update available : 1.1.0 (hash-beta-1.1.0) -> 1.2.0 (hash-beta-1.2.0)
nightly-{0} - Update available : 1.2.0 (hash-nightly-1) -> 1.3.0 (hash-nightly-2)
"
            ),
        );
    })
}
fn check_updates_some() {
    test(&|config| {
        config.with_scenario(Scenario::ArchivesV2_2015_01_01, &|config| {
            config.expect_ok(&["rustup", "toolchain", "add", "stable", "beta", "nightly"]);
        });
        config.with_scenario(Scenario::SimpleV2, &|config| {
        config.expect_stdout_ok(
            &["rustup", "check"],
            for_host!(
                r"stable-{0} - Update available : 1.0.0 (hash-stable-1.0.0) -> 1.1.0 (hash-stable-1.1.0)
beta-{0} - Update available : 1.1.0 (hash-beta-1.1.0) -> 1.2.0 (hash-beta-1.2.0)
nightly-{0} - Update available : 1.2.0 (hash-nightly-1) -> 1.3.0 (hash-nightly-2)
"
            ),
        );
            })
    })
}

The former version mutates the date with set_current_dist_date; the new version uses two scenarios, one for the earlier time, and one for the later time. This permits the server state to be constructed only once. On a per-test basis it can move as much as 50% of the time out of the test.

Single binary for the integration test suite

The next major gain was moving from having 14 separate integration test binaries to just one. This reduces the link cost of linking the test binaries, all of which link in the same library. It also permits us to see unused functions in our test support library, which helps with cleaning up cruft rather than having it accumulate.

Hard linking rather than copying ‘rustup-init’

Part of the test suite for each test is setting up an installed rustup environment. Why not start from scratch every time? Well, we obviously have tests that do that, but most tests are focused on steps beyond the new-user case. Setting up an installed rustup environment has a few steps, but particular ones are copying a binary of rustup into the test sandbox, and hard linking it under various names: cargo, rustc, rustup etc.

A debug build of rustup is ~20MB. Running 400 tests means about 8GB of IO; on some platforms most of that IO won’t hit disk, on others it will.

In review now is a PR that changes the initial copy to a hardlink: we hardlink the rustup-init built by cargo into each test, and then hardlink that to the various binaries. That saves 8GB of IO, which isn’t much from some perspectives, but it adds pressure on the page cache, and is wasted work. One wrinkle is a very low max-links limit on NTFS of 1023; to mitigate that we count the links made to rustup-init and generate a new inode for the original to avoid failures happening.

Future work

In GitHub actions this lowers our test time to 19m for Linux, 24m for Windows, which is a lot better but not great.

I plan on experimenting with separate actions for building release artifacts and doing CI tests – at the moment we have the same action do both, but they don’t share artifacts in the cache in any meaningful way, so we can probably gain parallelism there, as well as turning off release builds entirely for CI.

We should finish the cached test context work and use it everywhere.

Also we’re looking at having less integration tests and more narrow close to the code tests.

,

Lev Lafayette2022 HPC Training Utilisation and Results

Unique identifiers for 263 users who received HPC training in 2022 was determined from collected attendee records. Note that users may enrol in multiple courses (e.g., Introduction to Spartan, Advanced Spartan, Parallel Processing, etc) and may return for revision. All these users are counted once only.

From the unique users a total of 212 usernames could be determined from email addresses. When enrolling for training users do not include their Spartan usernmae or their university ID; sometimes they don't even use a university email address, despite requests.

There were 97 users who established an account but did not use Spartan (compute hours = 0). Of the remaining 115 users the total of job hours was determined from trained users was 6280454, after they received training. This calculation ensured that users who had already run jobs on Spartan prior to receiving training was not counted. e.g.,

$ sreport cluster AccountUtilizationByUser cluster=spartan user=$username start=2022-11-01 end=2022-12-31 -t hours

The total allocated hours of cluster utilisation 11597951, from the command:

$ sreport cluster Utilization cluster=spartan start=2022-11-01 end=2022-12-31 -t hours

The means that at least 54.14% of cluster utilisation in 2022 was conducted by users after receiving training.

The following steps are recommended to improve record-keeping and utilisation.

1) Emphasising the need to enrollees to use University of Melbourne email addresses only, and rejecting applications that do not do this.

2) Contacting those who attended training but did not use Spartan to ascertain why this was the case.

,

Tim SerongHack Week 22: An Art Project

Back in 2012, I received a box of eight hundred openSUSE 12.1 promo DVDs, which I then set out to distribute to local Linux users’ groups, tech conferences, other SUSE crew in Australia, and so forth. I didn’t manage to shift all 800 DVDs at the time, and I recently rediscovered the remaining three hundred and eighty four while installing some new shelves. As openSUSE 12.1 went end of life in May 2013, it seemed likely the DVDs were now useless, but I couldn’t bring myself to toss them in landfill. Instead, given last week was Hack Week, I decided to use them for an art project. Here’s the end result:

Geeko mosaic made of cut up openSUSE DVDs, on a 900mm x 600mm piece of plywood

Making that mosaic was extremely fiddly. It’s possibly the most annoying Hack Week project I’ve ever done, but I’m very happy with the outcome 🙂

The backing is a piece of 900mm x 600mm x 6mm plywood, primed with some leftover kitchen and bathroom undercoat, then spray pained black. I’d forgotten how bad spray paint smells, but it makes for a nice finish. To get the Geeko shape, I took the official openSUSE logo, then turned it into an outline in Inkscape, saved that as a PNG, opened it in GIMP, and cut it into nine 300mm x 200mm pieces which I then printed on A4 paper, stuck together with tape, and cut out to make a stencil. Of course, the first time I did that, nothing quite lined up, so I had to reprint it but with “Ignore page margins” turned off and “Draw crop marks” turned on, then cut the pages down along the crop marks before sticking them together the second time. Then I placed the stencil on the backing, glued the eye down (that just had to be made from the centre of a DVD!) and started laying out cut up DVD shards.

Geeko mosaic work in progress

I initially tried cutting the DVDs with tin snips, which is easy on the hands, but had a tendency to sometimes warp the DVD pieces and/or cause them to delaminate, so I reverted to a large pair of scissors which was more effort but ultimately less problematic.

After placing the pieces that made up the head, tail, feet and spine, and deciding I was happy with how they looked, I glued each piece down with superglue. Think: carefully pick up DVD shard without moving too many other shards, turn over, dab on a few tiny globs of superglue, lower into place, press for a few seconds, move to next piece. Do not get any superglue on your fingers, or you’ll risk sticking your fingers together and/or make a gluey mess on the shiny visible side of the DVD shards.

It was another three sessions of layout-then-glue-down to fill in the body. I think I stuck my fingers together about six, or eight, or maybe twenty times. Also, despite my best efforts to get superglue absolutely nowhere near the stencil at all, when I removed the stencil, it had stuck to the backing in several places. I managed to scrape/cut that off with a combination of fingernails, tweezers, and the very sharp knife in my SLE 12 commemorative Leatherman tool, then touched up the remaining white bits with a fine point black Sharpie.

SLE 12 commemorative Leatherman tool (it seemed appropriate to use this)

Judging from the leftover DVD centre pieces, this mosaic used about 12 DVDs in all, which isn’t very many considering my initial stash. I had a few other ideas for the remainder, mostly involving hanging them up somehow, which I messed around with earlier on while waiting for the paint to dry on the plywood.

One (failed) idea was to use a cutting wheel on my Dremel tool to slice half way through a few DVDs, then slot them into each other to make a hanging thingy that would spin in the wind. I was unable to make a smooth/straight enough cut for this to work, and superglue doesn’t bridge gaps. You can maybe get an idea of what I was aiming at from this photo:

Four DVDs slotted into each other vertically, kinda, one with nasty superglue smear

My wife had an idea for a better way to do this, which is to take a piece of dowel, cut slots in the sides, and glue DVD halves into the slots using Araldite (that’s an epoxy resin, in case you didn’t grow up with that brand name). I didn’t get around to trying this, but I reckon she’s onto something. Next time I’m at the hardware store, I’ll try to remember to pick up some suitably sized dowel.

I did make one somewhat simpler hanging thingy, which I call “Geeko’s Tail (Uncurled)”. It’s just DVDs superglued together on the flat, hanging from fishing line, but I think it’s kinda cool:

No, it’s not an upside down question mark, it’s “Geeko’s Tail (Uncurled)”

Also, I’ve discovered that Officeworks has an e-waste recycling program, so any DVDs I don’t use in future projects needn’t go to landfill.

Update 2023-02-20: For photos of the mosaic, plus wallpapers made from the photos, see https://github.com/tserong/hackweek22

,

Lev LafayetteThe Importance of Supercomputing

Most people use their computers (which includes mobile phones) for communication, social media, games, entertainment, office applications, and the like. Most of the time these activities are not particularly onerous in terms of computing as such or do not lead to enormous benefits in productivity, inventions, and discovery. There is one field, however, rarely discussed, that does do this - and that is supercomputing. It is through supercomputing that we are witnessing the most important technological advances of our day, including astronomy, weather and climate forecasting, materials science and engineering, molecular modeling, genomics, neurology, geoscience, and finance - all with numerous success stories.

Usually, I draw a distinction between supercomputing and high-performance computing. Specifically, a supercomputer is any computer system that has exceptional computational power at a particular point in time, many (but not all) of which are measured in the bi-annual Top500 list. Once upon a time dominated by monolithic mainframes supercomputers, in a contemporary sense, are a subset of high-performance computing, which is typically arranged as a cluster of commodity-grade servers with a high-speed interconnect and message-passing software that allows the entire unit to be treated as a whole. One can even put together a "supercomputer" from Raspberry Pi systems, as the University of Southhampton illustrates.

How important is this? For many years now we've known that there is a strong association between research output and access to such systems. Macroeconomic analysis shows that for every dollar invested in supercomputing, there is a return of forty-four dollars in profits or cost-savings. Both these metrics are almost certainly going to increase in time; datasets and problem complexity are growing at a rate greater than the computational performance of personal systems. More researchers need access to supercomputers.

However, researchers do require training to use such systems. The environment, the interface, the use of schedulers on a shared system, the location of data, is all something that needs to be learned. This is a big part of my life; in the last week, I spent three days teaching researchers from the basic of using a supercomputer system to scripting jobs, to using Australia's most powerful system Gadi at NCI, along with contributions at a board meeting of the international HPC Certification Forum. It is often a challenging vocation, but I feel confident that it is making a real difference to our shared lives. For that, I am very grateful.

,

Colin CharlesLong Malaysians, Short Malaysia

I have long said “Long Malaysians, Short Malaysia” in conversation to many. Maybe it took me a while to tweet it, but this was the first example: Dec 29, 2021. I’ve tweeted it a lot more since.

Malaysia has a 10th Prime Minister, but in general, it is a very precarious partnership. Consider it, same shit, different day?

I just have to get off the Malaysian news diet. Malaysians elsewhere, are generally very successful. Malaysians suffering by their daily doldrums, well, they just need to wake up, see the light, and succeed.

In the end, as much as people paraphrase, ask not what the country can do for you, legitimately, this is your life, and you should be taking good care of yourself and your loved ones. You succeed, despite of. Politics and the state happens, regardless of.

Me, personally? Ideas are abound for how to get Malaysians who see the light, to succeed elsewhere. And if I read, and get angry at something (tweet rage?), I’m going to pop RM50 into an investment account, which should help me get off this poor habit. I’ll probably also just cut subscriptions to Malaysian news things… Less exposure, is actually better for you. I can’t believe that it has taken me this long to realise this.

Time to build.

Colin CharlesHello 2023

I did poorly blogging last year. Oops. I think to myself when I read, This Thing Still On?, I really have to do better in 2023. Maybe the catalyst is the fact that Twitter is becoming a shit show. I doubt people will leave the platform in droves, per se, but I think we are coming back to the need for decentralised blogs again.

I have 477 days to becoming 40. I ditched the Hobonich Techo sometime in 2022, and just focused on the Field Notes, and this year, I’ve got a Monocle x Leuchtturm1917 + Field Notes combo (though it seems my subscription lapsed Winter 2022, I should really burn down the existing collection, and resubscribe).

2022 was pretty amazing. Lots of work. Lots of fun. 256 days on the road (what a number), 339,551km travelled, 49 cities, 20 countries.

The getting back into doing, and not being afraid of experimenting in public is what 2023 is all about. The Year of The Rabbit is upon us tomorrow, hence why I don’t mind a little later Hello 2023 :)

Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.

I better remember that.

,

Andrew RuthvenLet's Encrypt with Octavia in OpenStack

I like using Catalyst Cloud to host some of my personal sites. In the past I used to use CAcert for my TLS certificates, but more recently I've been using Let's Encrypt for my TLS certificates as they're trusted in all browsers. Currently the LoadBalancer as a Service (LBaaS) in Catalyst Cloud doesn't have built in support for Let's Encrypt. I could use an apache2/nginx proxy and handle the TLS termination there and have that manage the Let's Encrypt lifecycle, but really, I'd rather use LBaaS.

So I thought I'd set about working out how to get Dehydrated (the Let's Encrypt client I've been using) to drive LBaaS (known as Octavia). I figured this would be of interest to other people using Octavia with OpenStack in general, not just Catalyst Cloud.

There's a few things you need to do. These instructions are specific to Debian:

  1. Install and configure Dehydrated to create the certificates for the domain(s) you want.
    • apt install barbican
  2. Create the LoadBalancer (use the API, ClickOps, whatever), just forward port 80 for now (see sample Apache configs below).
  3. Save the sample hook.sh below to /etc/dehydrated/hook.sh, you'll probably need to customise it, mine is a bit more complicated!
  4. Insert the UUID of your LoadBalancer in hook.sh where LB_LISTENER is set.
  5. Create /etc/dehydrated/catalystcloud/password as described in hook.sh
  6. Save OpenRC file from the Catalyst Cloud dashboard as /etc/dehydrated/catalystcloud/openrc.sh
  7. Install jq, openssl and the openstack tools, on Debian this is:
    • apt install jq openssl python3-openstackclient python3-barbicanclient python3-octaviaclient
  8. Add TLS termination to your LoadBalancer
  9. You should be able to rename the latest certs /var/lib/dehydrated/certs/$DOMAIN and then run dehydrated -c to have it reissue and then deploy a cert.

As we're using HTTP-01 Challenge Type here, you need to have the LoadBalancer forwarding port 80 to your website to allow for the challenge response. It is good practice to have a redirect to HTTPS, here's an example virtual host for Apache:

<VirtualHost *:80>
    ServerName www.example.com
    ServerAlias example.com

    RewriteEngine On
    RewriteRule ^/.well-known/ - [L]
    RewriteRule ^/(.*)$ https://www.example.com/$1 [R=301,L]

    <Location />
        Require all granted
    </Location>
</VirtualHost>
You all also need this in /etc/apache2/conf-enabled/letsencrypt.conf:
Alias /.well-known/acme-challenge /var/lib/dehydrated/acme-challenges

<Directory /var/lib/dehydrated/acme-challenges>
        Options None
        AllowOverride None

        # Apache 2.x
        <IfModule !mod_authz_core.c>
                Order allow,deny
                Allow from all
        </IfModule>

        # Apache 2.4
        <IfModule mod_authz_core.c>
                Require all granted
        </IfModule>
</Directory>

And that should be all that you need to do. Now, when Dehydrated updates your certificate, it should update your LoadBalancer as well!

Sample hook.sh:
deploy_cert() {
    local DOMAIN="${1}" KEYFILE="${2}" CERTFILE="${3}" FULLCHAINFILE="${4}" \
          CHAINFILE="${5}" TIMESTAMP="${6}"
    shift 6

    # File contents should be:
    #   export OS_PASSWORD='your password in here'
    . /etc/dehydrated/catalystcloud/password

    # OpenRC file from the Catalyst Cloud dashboard
    . /etc/dehydrated/catalystcloud/openrc.sh --no-token

    # UUID of the LoadBalancer to be managed
    LB_LISTENER='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

    # Barbican uses P12 files, we need to make one.
    P12=$(readlink -f $KEYFILE \
        | sed -E 's/privkey-([0-9]+)\.pem/barbican-\1.p12/')
    openssl pkcs12 -export -inkey $KEYFILE -in $CERTFILE -certfile \
        $FULLCHAINFILE -passout pass: -out $P12

    # Keep track of existing certs for this domain (hopefully no more than 100)
    EXISTING_URIS=$(openstack secret list --limit 100 \
        -c Name -c 'Secret href' -f json \
        | jq -r ".[]|select(.Name | startswith(\"$DOMAIN\"))|.\"Secret href\"")

    # Upload the new cert
    NOW=$(date +"%s")
    openstack secret store --name $DOMAIN-$TIMESTAMP-$NOW -e base64 \
        -t "application/octet-stream" --payload="$(base64 < $P12)"

    NEW_URI=$(openstack secret list --name $DOMAIN-$TIMESTAMP-$NOW \
        -c 'Secret href' -f value) \
        || unset NEW_URI

    # Change LoadBalancer to use new cert - if the old one was the default,
    # change the default. If the old one was in the SNI list, update the
    # SNI list.
    if [ -n "$EXISTING_URIS" ]; then
        DEFAULT_CONTAINER=$(openstack loadbalancer listener show $LB_LISTENER \
            -c default_tls_container_ref -f value)

        for URI in $EXISTING_URIS; do
            if [ "x$URI" = "x$DEFAULT_CONTAINER" ]; then
                openstack loadbalancer listener set $LB_LISTENER \
                    --default-tls-container-ref $NEW_URI
            fi
        done

        SNI_CONTAINERS=$(openstack loadbalancer listener show $LB_LISTENER \
            -c sni_container_refs -f value | sed "s/'//g" | sed 's/^\[//' \
            | sed 's/\]$//' | sed "s/,//g")

        for URI in $EXISTING_URIS; do
            if echo $SNI_CONTAINERS | grep -q $URI; then
                SNI_CONTAINERS=$(echo $SNI_CONTAINERS | sed "s,$URI,$NEW_URI,")
                openstack loadbalancer listener set $LB_LISTENER \
                    --sni-container-refs $SNI_CONTAINERS
            fi
        done

        # Remove old certs
        for URI in $EXISTING_URIS; do
            openstack secret delete $URI
        done
    fi
}

HANDLER="$1"; shift
#if [[ "${HANDLER}" =~ ^(deploy_challenge|clean_challenge|sync_cert|deploy_cert|deploy_ocsp|unchanged_cert|invalid_challenge|request_failure|generate_csr|startup_hook|exit_hook)$ ]]; then
if [[ "${HANDLER}" =~ ^(deploy_cert)$ ]]; then
    "$HANDLER" "$@"
fi

,

Tim RileyOpen source status update, September 2022

Hello there, friends! This is going to be a short update from me because I’m deep in the throes of Hanami 2.0 release preparation right now. Even still, I didn’t want to let September pass without an update, so let’s take a look.

A story about Hanami::Action memory usage

September started and ended with me looking at the r10k memory usage charts for hanami-controller versus Rails. The results were surprising!

Initial memory usage for Hanami::Action vs Rails

We’d been running some of these checks as part of our 2.0 release prep, the idea being that it’d help us shake out any obvious performance improvements we’d need to make. And it certainly did in this case! Hanami (just like its dry-rb underpinnings) is meant to be the smaller and lighter framework; why were we being outperformced by Rails?

To address this I wrote a simple memory profile script for Hanami::Action inheritance (now checked in here) and started digging.

Here were there initial results:

Total allocated: 184912288 bytes (1360036 objects)
Total retained:  104910880 bytes (780031 objects)

allocated memory by gem
-----------------------------------
  56242240  concurrent-ruby-1.1.10
  53282480  dry-configurable-0.15.0
  34120000  utils-8585be837309
  30547488  other
  10720080  controller/lib

That’s 185MB allocated for 10k subclasses, with concurrent-ruby, dry-configurable and hanami-utils being the top three gems allocating memory.

This led me straight to dry-configurable, and after a couple of weeks of work, I arrived at this PR, separating our storage of setting definitions from their configured values, among other things. This change allows us to copy less data at the moment of class inheritance, and in the case of a dry-configurable-focused memory profile, cut the allocated memory by more than half.

From there, I moved back into hanami-controller and updated it to use dry-configurable for all of its inheritable attributes (some were handled separately), also taking advantage the support for custom config classes that Piotr added so we could preserve Hanami::Action’s existing configuration API.

This considerably improved our benchmark! Behold:

Total allocated: 32766232 bytes (90004 objects)
Total retained:  32766232 bytes (90004 objects)

allocated memory by gem
-----------------------------------
  21486072  other
  10880120  dry-configurable-0.16.1
    400040  3.1.2/lib

Yes, we brought 185MB allocated memory down to 33MB! This also brought us on par with Rails in the extreme end of the r10k memory usage benchmark:

Updated memory usage for Hanami::Action vs Rails

Here’s a thing though: the way r10k generates actions for its Rails benchmark is to create a single controller class with a method per action. So for the point on the far right of that chart, that’s a single class with 10k methods. Hardly realistic.

So I made a quick tweak to see how things would look if the r10k Rails benchmark generated a class per endpoint like we do with Hanami::Action:

Hanami::Action vs Rails with a separate controller class per action

That’s more like it. This is another extreme, however: more realistically, we’d see Rails apps with somewhere between 5-10 actions per controller class, which would lower its dot a little in that graph. In my opinion this would be a useful thing to upstream into r10k. It’s already a contrived benchmark, yes, but it’d be more useful if it at least mimicked realistic application structures.

Either way, we finished the month much more confident that we’ll be delivering on our promise of Hanami as the lighter, faster framework alternative. A good outcome!

Along the way, however, things did feel bleak at times. I wasn’t confident that I’d be able to make things right, and it didn’t feel great to think we might’ve spent years putting somethign together that wasn’t going to be able to deliver on some of those core promises. Luckily, I found all the wins we needed, and learnt a few things along the way.

Hanami 2.0, here we come

What else happened in September? Possibly the biggst thing is that we organised ourselves for the runway towards the final Hanami 2.0.0 release.

We want to do everything possible to make sure the release happens this year, so I spent some time organising the remaining tasks on our Trello board into date-based lists, aiming for a release towards the end of November. It looked achievable! The three of us in the core team re-committed ourselves to doing everything we could to complete these tasks in our estimated timeframes.

So far, things have gone very well!

Hanami 2.0.0 release progress on Trello

We’ve all been working tremendously hard, and so far, this has let us keep everything to the schedule. I’ll have a lot to share about our work across October, but that’s all for next month’s update. So in the meantime, I have to put my head back down and get back to shipping a framework. See you all again soon!

,

Tim SerongTANSTAAFL

It’s been a little over a year since our Redflow ZCell battery and Victron Energy inverter/charger kit were installed on our existing 5.94kW solar array. Now that we’re past the Southern Hemisphere spring equinox it seems like an opportune time to review the numbers and try to see exactly how the system has performed over its first full year. For background information on what all the pieces are and what they do, see my earlier post, Go With The Flow.

As we look at the figures for the year, it’s worth keeping in mind what we’re using the battery for, and how we’re doing it. Naturally we’re using it to store PV generated electricity for later use when the sun’s not shining. We are also charging the battery from the grid at certain times so it can be drawn down if necessary during peak times, for example I set up a small overnight charge to ensure there was power for the weekday morning peak, when the sun isn’t really happening yet, but grid power is more than twice as expensive. More recently in the winter months, I experimented with keeping the battery full with scheduled charges during most non-peak times. This involved quite a bit more grid charging, but got us through a couple of three hour grid outages without a hitch during some severe weather in August.

I spent some time going through data from the VRM portal for the last year, and correlating that with current bills from Aurora energy, and then I tried to compare our last year of usage with a battery, to the previous three years of usage without a battery. For reasons that will become apparent later, this turned out to be a massive pain in the ass, so I’m going to start by looking only at what we can see in the VRM portal for the past year.

The VRM portal has three summary views: System Overview, Consumption and Solar. System Overview tells us overall how much total power was pulled from the grid, how much was exported to the grid, how much was produced locally, and how much was consumed by our loads. The Consumption view (which I wish they’d named “Loads”, because I think that would be clearer) gives us the same consumption figure, but tells us how much of that came from the grid, vs. what came from the battery vs. what came from solar. The Solar view tells us how much PV generation went to the grid, how much went to the battery, and how much was used directly. There is some overlap in the figures from these three views, but there are also some interesting discrepancies, notably: the “From Grid” and “To Grid” figures shown under System Overview are higher than what’s shown in the Consumption and Solar views. But, let’s start by looking at the Consumption and Solar views, because those tell us what the system gives us, and what we’re using. I’ll come back after that to the System Overview, which is where things start to get weird and we discover what the system costs to run.

The VRM portal lets you chose any date range you like to get historical figures and bar charts. It also gives you pie charts of the last 24 hours, 7 days, 30 days and 365 days. To make the figures and bar charts match the pie charts, the year we’re analysing starts at 4pm on September 25, 2021 and ends at 4pm on September 25, 2022, because that’s exactly when I took the following screenshots. This means we get a partial September at each end of the bar chart. I’m sorry about that.

Here’s the Consumption view:

Consumption view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

This shows us that in the last 12 months, our loads consumed 10,849kWh of electricity. Of that, 54% (5,848kWh) came from the grid, 23% (2,506kWh) came direct from solar PV and the final 23% (2,494kWh) came from the battery.

From the rough curve of the bar chart we can see that our consumption is lower in the summer months and higher in the winter months. I can’t say for certain, but I have to assume that’s largely due to heating. The low in February was 638kWh (an average of 22.8kWh/day). The high in July was 1,118kWh (average 36kWh/day).

Now let’s look at the Solar view:

Solar view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

In that same time period we generated 5,640kWh with our solar array, of which 44% (2,506kWh) was used directly by our loads, 43% (2,418kWh) went into the battery and 13% (716kWh) was exported to the grid.

Unsurprisingly our generation is significantly higher in summer than in winter. We got 956kWh (average 30kWh/day) in December but only 161kWh (5.3kWh/day) in June. Peak summer figures like that mean we’ll theoretically be able to do without grid power at all during that period once we get a second ZCell (note that we’re still exporting to the grid in December – that’s because we’ve got more generation capacity than storage). The winter figures clearly indicate that there’s no way we can provide anywhere near all our own power at that time of year with our current generation capacity and loads.

Now look closely at the summer months (December, January and February). There should be a nice curve evident there from December to March, but instead January and February form a weird dip. This is because we were without solar generation for three weeks from January 20 – February 11 due to replacing a faulty MPPT. Based on figures from previous years, I suspect we lost 500-600kWh of potential generation in that period.

Another interesting thing is that if we compare “To Battery” on the Solar view (2,418kWh) with “From Battery” on the Consumption view (2,494kWh), we see that our loads consumed 76kWh more from the battery than we actually put into it with solar generation. This discrepancy is due to the fact that in addition to charging the battery from solar, we’ve also been charging it from the grid at certain times, but the amount of power sent to the battery from the grid isn’t broken out explicitly anywhere in the VRM portal.

Now let’s look at the System Overview:

System Overview view from VRM portal, 2021-09-25 16:00 – 2022-09-25 16:00

Here we see the same figures for “Production” (5,640kWh) and “Consumption” (10,849kWh) as were in the Consumption and Solar views, and the bar chart shows the same consumption and generation curves (ignore the blue overlay and line which indicate battery minimum/maximum and average state of charge – that information is largely meaningless at this scale, given we cycle the battery completely every day).

Now look at “To Grid” and “From Grid”. “To Grid” is 754 kWh, i.e. we somehow sent 38kWh more to the grid than came from solar. “From Grid”, at 8,531kWh, is a whopping 2,683kWh more than the 5,848kWh grid power consumed by our loads (i.e. close to half as much again).

So, what’s going on here?

One factor is that we’re charging the battery from the grid at certain times. Initially that was a few hours overnight and a few hours in the afternoon on weekdays, although the afternoon charge is obviously also provided by the solar if the sun is shining. For all of July, August and most of September though I was using a charge schedule to keep the battery full except for peak times and maintenance cycle nights, which meant quite a bit more grid charging overnight than earlier in the year, as well as grid charging most of the day during days with no or minimal sunshine. Grid power sent to the battery isn’t visible in the “From Grid” figure on the Consumption view – that view shows only our loads, i.e. the equipment the system is powering – but it is part of the “From Grid” figure in the System Overview.

Similarly, some of the power we export to the grid is actually exported from the battery, as opposed to being exported from solar generation. That usually only happens during maintenance cycles when our loads aren’t enough to draw the battery down at the desired discharge rate. But again, same thing, that figure is present here on the system overview page as part of “To Grid”, but of course is not part of the “To Grid” figure on the Solar view.

Another factor is that the system itself needs some amount of power to operate. The Victron kit (the MultiPlus II Inverter/Chargers, the Cerbo GX, the MPPT) use some small amount of power themselves. The ZCell battery also requires power to operate its pumps and fans. When the sun is out this power can of course come from solar. When solar power is not available, power to run the system needs to come from some combination of the remaining charge in the battery, and the grid.

On that note, I did a little experiment to see how much power the system uses just to operate. On July 9 (which happened to be a maintenance cycle day), I disabled all scheduled battery charges, and I shut off the DC isolators for the solar PV, so the battery would remain online (pumps and fans running) but empty for all of July 10. The following day I went and checked the figures on the System Overview, which showed we drew 35kWh, but that our consumption was 33kWh. So, together, the battery doing nothing other than running its pumps and fans, plus the Multis doing nothing other than passing grid power through, used 2kWh of power in 24 hours. Over a year, that’s 730kWh. As mentioned above, ordinarily some of that will be sourced from mains and some from solar, but if we look at the total power that came into the system as a whole (5,640kWh from solar + 8,531kWh from the grid = 14,171kWh), 730kWh is just slightly over 5% of that.

The final factor in play is that a certain amount of power is naturally lost due to conversion at various points. The ZCell has a maximum 80% DC-DC stack efficiency, meaning in the absolute best case if you want to get 10kW out of it, you have to put 12.5kW in. In reality you’ll never hit the best case: the lifetime charge and discharge figures the BMS currenly shows for our ZCell are 4,423 and 3,336kWh respectively, which is a bit over 75%. The Multis have a maximum efficiency of 96% when doing their invert/charge dance, so if we grid charge the battery, we lose at least 4% on the way in, and at least 4% on the way out as well, going to and from AC/DC. Again, in reality that loss will be higher than 4% each way, because 96% is the maximum efficiency.

A bunch of the stuff above just doesn’t apply to the previous system with the ABB inverter and no battery. I also don’t have anything like as much detailed data to go on for the old system, which makes comparing performance with the new system fiendishly difficult. The best comparison I’ve been able to come up with so far involves looking at total power input to the system (power from grid plus solar generation), total consumption by loads (i.e. actual locally usable power), and total power exported.

Prior to the Victron gear and Redflow battery installation, I had grid import and export figures from my Aurora Energy bills, and I had total generation figures from the ABB inverter. From this I can synthesise what are hopefully reasonably accurate load consumption figures by adding adding grid input to total PV generation minus grid export.

I had hoped to do this analysis on a quarterly basis to line up with Aurora bills, because then I would also be able to see how seasonal solar generation and usage went up and down. Unfortunately the billing for 2020 and 2021 was totally screwed up by the COVID-19 pandemic, because there were two quarters during which nobody was coming out to read the electricity meter. The bills for those quarters stated estimated usage (i.e. were wrong, especially given they estimated grid export as zero), with subsequent quarters correcting the figures. I have no way to reliably correlate that mess with my PV generation figures, except on an annual basis. Also, using billing periods from pre-battery years, the closest I can get to the September 25 based 2021-2022 year I’m looking at now is billing periods starting and ending in mid-August. But, that’s close enough. We’ve still got four pretty much back-to-back 12 month periods to look at.

YearGrid InSolar InTotal InLoadsExport
2018-20199,0316,68215,71311,8273,886
2019-20209,3246,46815,79212,2553,537
2020-20217,5826,34713,92910,3583,571
2021-20228,5315,64014,17110,849754

One thing of note here is that in the 2018-2019 and 2019-2020 years, our annual consumption was pretty close to 12MWh, whereas in 2020-2021 and 2021-2022 it was closer to 10.5MWh. If I had to guess, I’d say that ~1.5MWh/year drop is due to a couple of pieces of computer equipment that were previously always on, now mostly running in standby mode except when actually needed. A couple of hundred watts constant draw is a fair whack of power over the course of a year. Another thing to note is the big drop in power exported in 2021-2022, because most of our solar generation is now used locally.

The thing that freaked me out when looking at these figures is that in the battery year, while our loads consumed 491kWh more than in the previous non-battery year, we pulled 949kWh more power in from the grid! This is the opposite of what I had expected to see, especially having previously written:

In the eight months the system has been running we’ve generated 4631kWh of electricity and “only” sent 588kWh to the grid, which means we’ve used 87% of what we generated locally – much better than the pre-battery figure of 45%. I suspect we’ve reduced the amount of power we pull from the grid by about 30% too, but I’ll have to wait until we have a full year’s worth of data to be sure.

– by me at the end of Go With The Flow

When I wrote that, I was looking at August 31, 2021 through April 27, 2022, and comparing that to the August 2020 to May 2021 grid power figures from my old Aurora bills. The mistake I must have made back then was to look at “From Grid” on the Consumption view, rather than “From Grid” on the System Overview. I’ve just done this exercise again, and the total grid draw from our Aurora bills from August 2020 to May 2021 is 4,980kWh. “From Grid” on the Consumption view for August 2021 to May 2022 is 3,575kWh, which is about 30% less, but “From Grid” on the System Overview is 4,754kWh, which is only about 5% less. So our loads pulled about 30% less from the grid than the same time the year before, but our system as a whole didn’t.

Now let’s break our ridiculous September-based year down further into months, to see if we can see more detail. I’ve highlighted some interesting periods in bold.

MonthGrid InSolar InTotal InLoadsExport
Sep 21 (part)1531012542136
Oct 216366291,26598855
Nov 214307471,17786697
Dec 212329561,188767176
Jan 226524501,10282274
Feb 2247043090063883
Mar 224985681,06681364
Apr 2260937798677527
May 229102381,1489533
Jun 221,1141611,27510732
Jul 221,1632231,386111811
Aug 229103751,28596664
Sep 22 (part)7543851,13985792
Total8,5315,64014,17110,849754

December is great. We generated about 25% more power than our loads use (956/767=1.25), and our grid input was only about 30% of the total of our loads (232/767=0.30).

January and February show the effects of missing three weeks of potential generation. I mean, just look at December through February 2021-2022 versus the previous three summers.

PV Generation December through January 2018-2022
 2018-20192019-20202020-20212021-2022
December919882767956
January936797818450
February699656711430

June and July are terrible. They’re our highest load months, with the lowest solar generation and we pulled 3-4% more power from the grid than our loads actually consumed. I’m going to attribute the latter largely to grid charging the battery.

If I dig a couple of interesting figures out for June and July I see “To Battery” on the Solar view shows 205kWh, and “From Battery” on the Consumption view shows 558kWh. Total consumption in that period was 2,191kWh, with the total “From Grid” reported in System Overview of 2,277kWh. Let’s mess with that a bit.

Bearing in mind the efficiency numbers mentioned earlier, if 205kWh went to the battery from PV, that means no more than 154kWh of what we got out of the battery was from PV generation (remember: real world DC-DC stack efficiency of about 75%). The remaining 404kWh out of the battery is power that went into it from the grid. And that means at least 538kWh in (404/0.75). Note that total from grid for these two months was 86kWh more than the 2,191kWh used by our loads. If I hadn’t been keeping the battery topped up from the grid, I’d’ve saved at least 134kWh of grid power, which would have brought our grid input figure back down below our consumption figure. Note also that this number will actually be higher in reality because I haven’t factored in AC/DC conversion losses from the Multis.

Now let’s look at some costs. When I started trying to compare the new system to the previous system, I went in thinking to look at in in terms of total power input to the system, total consumption by loads, and total power exported. There’s one piece missing there, so let’s add another couple of columns to an earlier table:

YearGrid InSolar InTotal InLoadsExportTotal Outwhat?
2021-20228,5315,64014,17110,84975411,6032,568

The total usable output of the system was 11,603kWh for 14,171kWh input. The difference between these two figures – 2,568kWh, or about 18% – went somewhere else. Per my earlier experiment, 5% is power that went to actually operate the system components, including the battery. That means about 13% of the power input to the system over the course of the year must have gone to some combination of charge/discharge and AC/DC conversion (in)efficiencies. We can consider this the energy cost of the system. To have the ability to time-shift expensive peak grid electricity, and to run the house without the grid if the sun is out, or from the battery when it has charge, costs us 18% of the total available energy input.

Grid power has energy costs too, but we’re not usually aware of this because it happens somewhere else. I haven’t yet found Tasmanian figures, but this 2021 Transmission Annual Planning Report PDF from Powerlink in Queensland has historical figures showing that about 7% of generation there went to auxiliaries, i.e. fans and pumps and things running at the power stations. And according to the Australian Energy Market Operator (AEMO), 10% of grid power generated is lost during transmission and distribution. Stanwell (a power company in Queensland) have a neat explainer of all this on their What’s Watt site.

Finally, speaking of expensive grid electricity, let’s look at how much we paid Aurora Energy over the past four years for our power. The bills are broken out into different tariffs, for which you’re charged different amounts per kilowatt hour and then there’s an additional daily supply charge, and also credits for power exported. We can simplify that by just taking the total dollar value of all the power bills and dividing that by the total power drawn from the grid to arrive at an effective cost per kilowatt hour for the entire year. Here it is:

YearFrom GridTotal BillCost/kWh
2018-20199,031$2,278.33$0.25
2019-20209,324$2,384.79$0.26
2020-20217,582$1,921.77$0.25
2021-20228,531$1,731.40$0.20

So, the combination of the battery plus the switch from Flat Rate to Peak & Off-Peak billing has reduced the cost of our grid power by about 20%. I call that a win.

Going forwards it will be interesting to see how the next twelve months go, and, in particular, what we can do to reduce our power consumption. A significant portion of our power is used by a bunch of always-on computer equipment. Some of that I need for my work, and some of that provides internet access, file storage and email for us personally. Altogether, according to the UPSes, this kit pulls 200-250 watts continuously, but will pull more than that during the day when it’s being used interactively. If we call it 250W continuous, that’s a minimum of 6kWh/day, which is 2,190kWh/year, or about 20% of the 2021-2022 consumption. Some of that equipment should be replaced with newer, more power efficient kit. Some of it could possibly even be turned off or put into standby mode some of the time.

We still need to get a heat pump to replace the 2400W panel heater in our bedroom. That should save a huge amount of power in winter. We’re also slowly working our way through the house installing excellent double glazed windows from Elite Double Glazing, which will save on power for heating and cooling year round.

And of course, we still need to get that second ZCell.

,

Tim RileyOpen source status update, August 2022

August’s OSS work landed one of the last big Hanami features, saw another Hanami release out the door, began some thinking about memory usage, and kicked off a fun little personal initiative. Let’s dive in!

Conditional slice loading in Hanami

At the beginning of the month I merged support for conditional slice loading in Hanami. I’d wanted this feature for a long time, and in fact I’d hacked in workarounds to achieve the same more than 2 years ago, so I was very pleased to finally get this done, and for the implementation work to be as smooth as it was.

The feature provides a new config.slices setting on your app class, which you can configure like so:

module MyApp
  class App < Hanami::App
    config.slices = %w[admin]
  end
end

For an app consisting of both Admin and Main slices and for the config above, when the app is booted, only the Admin slice will be loaded:

require "hanami/prepare"

Hanami.app.slices.keys # => [:admin]

Admin::Slice # exists, as expected
Main         # raises NameError, since it was never loaded

As we see from Main above, slices absent from this list will not have their namespace defined, nor their slice class loaded, nor any of their Ruby source files. Within that Ruby process, they effectively do not exist.

Specifying slices to load can be very helpful to improve boot time and minimize memory usage for specific deployed workloads of your app.

Imagine you have a subset of background jobs that run via a dedicated job runner, but whose logic is otherwise unneeded for the rest of your app to function. In this case, you could organize those jobs into their own slice, and then load only that slice for the job runner’s process. This arrangement would see the job runner boot as quickly as possible (no extraneous code to load) as well as save all the memory otherwise needed by all those classes. You could also do the invserse for your main deployed process: specify all slices except this jobs slice, and you gain savings there too.

Organising code into slices to promote operational efficiency like this also gives you the benefit of greater clarity in the separation of responsibilities between those slices: when a single slice of code is loaded and the rest of your app is made to disappear, that will quickly surface any insidious dependencies from that slice to the rest of your code (they’ll be raised as exceptions!). Cleaning these up will help ensure your slices remain useful as abstractions for reasoning about and maintaining your app.

To make it easy to tune the list of slices to load, I also introduced a new HANAMI_SLICES env var that sets this config without you having to write code inside your app class. In this way, you could use them in your Procfile or other similar deployment code:

web: HANAMI_SLICES=main,admin bundle exec puma -C config/puma.rb
feed_worker: HANAMI_SLICES=feed bundle exec rake jobs:work

This effort was also another example of why I’m so happy to be working alongside the Hanami core team. After initially proposing a more complex arrangement including separate lists for including or excluding slices, Luca jumped in and help me dial this back to the much simpler arrangement of the single list only. For an Hanami release in which we’re going to be introducing so many new ideas, the more we can keep simple around them, the better, and I’m glad to have people who can remind me of this.

Fixed how slice config is applied to component classes

Our action and view integration code relies on their classes detecting when they’re defined inside a slice’s namespace, then applying relevant config from the slice to their own class-level config object. It turned out our code for doing this broke a little when we adjusted our default class hierarchies. Thanks to some of our wonderful early adopters, we picked this up quickly and I fixed it. Now things just work like you expect however you choose to configure your action classes, whether through the app-level config.actions object, or by directly updating config in a base action class.

In doing this work, I became convinced we need an API on dry-configurable to determine whether any config value has been assigned or mutated by the user, since it would help so much in reliably detecting whether or not we should ignore config values at particular levels. For now, we could work around it, but I hope to bring this to dry-configurable at some point in the future.

Released Hanami 2.0.0.beta2

Another month passed, so it was time for another release! With my European colleagues mostly enjoying some breaks over their summer, I hunkered down in chilly Canberra and took care of the 2.0.0.beta2 release. Along with the improvements above, this release also included slice and action generators (hanami generate slice and hanami generate action, thank you Luca!), plus a very handle CLI middlewares inspector (thank you Marc!):

$ hanami middlewares

/    Dry::Monitor::Rack::Middleware (instance)
/    Rack::Session::Cookie

The list of things to do over the beta phase is getting smaller. I don’t expect we’ll need too many more of these releases!

Created memory usage benchmarks for dry-configurable

As the final 2.0 release gets closer, we’ve been doing various performance tests just to make sure the house is in order. One thing we discovered is that Hanami::Action is not as memory efficient as we’d like it to be. One of the biggest opportunities to improve this looked to be in dry-configurable, since that’s what is used to manage the per-class action configuration.

I suspected any effort here would turn out to be involved (and no surprise, it turned out to be involved 😆), so I thought it would be useful as a first step to establish a memory benchmark to revisit over the course of any work. This was also a great way to get my head in this space, which turned out to take over most of my September (but more on that next month).

Quietly relaunched Decaf Sucks

Decaf Sucks was once a thriving little independent online café review community, with its own web site (starting from humble beginnings as a Rails Rumble entry in 2009) and even native iOS app (two iterations, in fact).

I was immensitely proud of what Decaf Sucks became, and for the collaboration with Max Wheeler in building it.

Unfortunately, as various internet APIs changed, the site atrophied, eventually became disfunctional, and we had to take it down. I still have the database, however, and I want to bring it back!

This time around, my plan is to do it as a fully open source Hanami 2 example application. Max is even on board to bring back all the UI goodness. For now, you can follow along with the early steps on GitHub. Right now the app is little more than the basic Hanami skeleton with added database integration and a CI setup (Hello Buildkite!), but I plan to grow it bit by bit. Perhaps I’ll try to have something small that I can share with each of these monthly OSS updates.

After Hanami 2 ships, hopefully this will serve as a useful resource for people wanting to see how it plays out in a real working app. And beyond that, I look forward to it serving once again as a place for me to commemorate my coffee travels!

,

Tim SerongAn S3 Storage Experiment

My team at SUSE is working on a new S3-compatible storage solution for Kubernetes, based on Ceph’s RADOS Gateway (RGW), except without any of the RADOS bits. The idea is that you can deploy our s3gw container on top of Longhorn (which provides the underlying replicated storage), and all this is running in your Kubernetes cluster, along with your applications which thus have convenient access to a local S3-compatible object store.

We’ve done this by adding a new storage backend to RGW. The approach we’ve taken is to use SQLite for metadata, with object data stored as files in a regular filesystem. This works quite neatly in a Kubernetes cluster with Longhorn, because Longhorn can provide a persistent volume (think: an ext4 filesystem), on which s3gw can store its SQLite database and object data files. If you’d like to kick the tyres, check out Giuseppe’s deployment tutorial for the 0.2.0 release, but bear in mind that as I’m writing this we’re all the way up to 0.4.0 so some details may have changed.

While s3gw on Longhorn on Kubernetes remains our primary focus for this project, the fact that this thing only needs a filesystem for backing storage means it can be run on top of just about anything. Given “just about anything” includes an old school two node Pacemaker cluster with DRBD for replicated storage, why not give that a try? I kinda like the idea of a good solid highly available S3-compatible storage solution that you could shove into the bottom of a rack somewhere without too much difficulty.

It’s probably eight years since I last deployed Pacemaker and DRBD, so to refresh my memory I ran with SUSE’s latest Highly Available NFS Storage with DRBD and Pacemaker document, but skipped all the NFS bits. That gives a filesystem mounted on one node, which will fail over to the other node if something breaks. On top of that, we need to run the s3gw container, the s3gw-ui container, an nginx HTTPS reverse proxy to smoosh those two together, and a virtual/floating IP, so the whole lot is accessible to the outside world.

Here’s the interesting parts of my Pacemaker configuration:

# crm configure show
[...]
primitive drbd_s3 ocf:linbit:drbd \
        params drbd_resource=s3 drbdconf="/etc/drbd.conf" \
        op monitor interval=29s role=Master \
        op monitor interval=31s role=Slave
primitive fs_s3 Filesystem \
        params device="/dev/drbd0" directory="/data" fstype=ext4 \
        meta target-role=Started \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive https nginx \
        op start timeout=40s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor timeout=30s interval=10s \
        op monitor timeout=30s interval=30s \
        op monitor timeout=60s interval=20s
primitive s3-ip IPaddr2 \
        params ip=192.168.100.50 \
        op monitor interval=10 timeout=20
primitive s3gw podman \
        params image="ghcr.io/aquarist-labs/s3gw:latest" run_opts="-p 7480:7480 -v/data:/data" \
        op start interval=0 timeout=90s \
        op stop interval=0 timeout=90s \
        op monitor interval=30s timeout=30s
primitive s3gw-ui podman \
        params image="ghcr.io/aquarist-labs/s3gw-ui:latest" run_opts="-p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test" \
        op start interval=0 timeout=90s \
        op stop interval=0 timeout=90s \
        op monitor interval=30s timeout=30s
group g-s3 fs_s3 s3gw s3gw-ui https s3-ip
ms ms-drbd_s3 drbd_s3 \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
colocation col-s3_on_drbd inf: g-s3 ms-drbd_s3:Promoted
order o-drbd_before_fs Mandatory: ms-drbd_s3:promote g-s3:start
[...]

The g-s3 group ensures that the ext4 filesystem (fs_s3), s3gw container (s3gw), s3gw-ui container (s3gw-ui), nginx instance (https) and virtual IP (s3-ip) all run on the same node, and start one after another. The colocation and ordering constraints ensure that g-s3 runs on whichever node is currently the DRBD (ms-drbd_s3) primary.

The important pieces of glue here are:

  • The fs_s3 resource mounts /dev/drbd0 on /data
  • The s3gw resource passes -p 7480:7480 -v/data:/data to podman, so the container can write to /data on the host, and the S3 service is accessible via HTTP on port 7480.
  • The s3gw-ui resource passes -p 8080:8080 -e RGW_SERVICE_URL=https://s3gw.sleha.test to podman, so the UI is accessible via HTTP on port 8080, and it expects the S3 service to be externally available via https://s3gw.sleha.test.
  • nginx is configured to reverse proxy https://s3gw.sleha.test to http://localhost:7480, and https://s3gw-ui.sleha.test to http://localhost:8080.
  • I’ve got an entry in /etc/hosts to point s3gw.sleha.test and s3gw-ui.sleha.test at the virtual IP (192.168.100.50).
  • I’m using self-signed certificates (openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.pem) for s3gw and s3gw-ui, so I had to go visit both https://s3gw.sleha.test and https://s3gw-ui.sleha.test in my browser and accept the SSL certificate before the UI would work.
  • The DRBD config, nginx config and SSL certificates and keys need to be present on all nodes. I used csync2 for this.

Here’s my /etc/nginx/nginx.conf. I’m not entirely convinced I’ve got everything 100% right here, but it seems to work (this is, incredibly, my first time doing anything with nginx, and my first time dealing with CORS):

worker_processes  1;

events {
    worker_connections  1024;
    use epoll;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       80;
        return       301 https://$host$request_uri; 
    }

    server {
        listen       443 ssl;
        server_name  s3gw.sleha.test;

        access_log /var/log/nginx/s3gw.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            add_header Access-Control-Allow-Origin 'https://s3gw-ui.sleha.test';
            add_header Access-Control-Allow-Methods 'GET,HEAD,PUT,POST,DELETE';
            add_header Access-Control-Allow-Headers '*';
            add_header 'Access-Control-Allow-Credentials' 'true';

            if ($request_method = 'OPTIONS') {
                add_header Access-Control-Allow-Origin 'https://s3gw-ui.sleha.test';
                add_header Access-Control-Allow-Methods 'GET,HEAD,PUT,POST,DELETE';
                add_header Access-Control-Allow-Headers '*';
                add_header 'Access-Control-Allow-Credentials' 'true';
                add_header 'Content-Type' 'text/plain charset=UTF-8';
                add_header 'Content-Length' 0;
                return 204;
            }

            proxy_pass          http://localhost:7480;
            proxy_read_timeout  90;
            proxy_redirect      http://localhost:7480 https://s3gw.sleha.test;
        }

        ssl_certificate      cert.pem;
        ssl_certificate_key  cert.key;
        ssl_protocols        TLSv1.2;
        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  5m;
        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;
    }

    server {
        listen       443 ssl;
        server_name  s3gw-ui.sleha.test;

        access_log /var/log/nginx/s3gw-ui.access.log;

        location / {
            proxy_set_header        Host $host;
            proxy_set_header        X-Real-IP $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        X-Forwarded-Proto $scheme;

            proxy_pass          http://localhost:8080;
            proxy_read_timeout  90;

            proxy_redirect      http://localhost:8080 https://s3gw-ui.sleha.test;
        }

        ssl_certificate      cert-ui.pem;
        ssl_certificate_key  cert-ui.key;
        ssl_protocols        TLSv1.2;
        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  5m;
        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;
    }
}

A couple of important points about Pacemaker’s support for running containers with podman:

So what was the end result? TL;DR: It pretty much All Just WorkedTM, which is exactly what you’d hope for when running a new application on a mature HA stack. I can use s3cmd to mess around with the S3 service, and use my web browser to play with the UI. Failover is nice and quick (think: a few seconds) if I kill a node. For the sake of convenience I did this experiment on a couple of VMs using the external/libvirt STONITH plugin, but I don’t expect a real deployment to be hugely different in behaviour. Also, I’d forgotten how good Pacemaker is at highlighting poorly behaved applications – prior to this experiment the s3gw-ui container didn’t stop well, but we weren’t aware of that until I tried a manual failover which took too long and resulted in an unexpected STONITH due to a stop timeout. Moritz has since fixed that.

One thing I tripped over when doing this deployment was the correct values to use for the access_key and secret_key of the default user when talking to the S3 service. These are actually settable for the s3gw container via the RGW_DEFAULT_USER_ACCESS_KEY and RGW_DEFAULT_USER_SECRET_KEY environment variables, but if left unset, they default to “test” and “test” respectively. The interesting bits of my s3cmd.cfg are thus:

access_key = test
secret_key = test
host_base = https://s3gw.sleha.test/
host_bucket = htts://s3gw.sleha.test/%(bucket)

In retrospect I probably should have added -e RGW_DEFAULT_USER_ACCESS_KEY=tserong -e RGW_DEFAULT_USER_SECRET_KEY=do_not_tell_anyone_this_is_your_password to the run_opts parameter of the s3gw resource in the Pacemaker config.

,

Tim RileyOpen source status update, May–July 2022

Hi there friends, it’s certainly been a while, and a lot has happened across May, June and July: I left my job, took some time off, and started a new job. I also managed to get a good deal of open source work done, so let’s take a look at that!

Released Hanami 2.0.0.alpha8

Since we’d skipped a month in our releases, I helped get Hanami 2.0.0.alpha8 out the door in May. The biggest change here was that we’d finished relocating the action and view integration code into the hanami gem itself, wrapped up in distinct “application� classes, like Hanami::Application::Action. In the end, this particular naming scheme turned out to be somewhat short lived! Read on for more :)

Resurrected work using dry-effects within hanami-view

As part of an effort to make it easy to use our conventional view “helpers� in all parts of our view layer, I resurrected my work from September 2020(!) on using dry-effects within hanami-view. The idea here was to achieve two things:

  1. To ensure we keep only a single context object for the entire view rendering, allowing its state to be preserved and accessed by all view components (i.e. allowing both templates, partials and parts all to access the very same context object)
  2. To enable access to the current template/partial’s #locals from within the context, which might help make our helpers feel a little more streamlined through implicit access to those locals

I got both of those working (here’s my work in progress), but I discovered the performance had worsened due to the cost of using an effect to access the locals. I took a few extra passes at this, reducing the number of effects to one, and memoziing it, leaving us with improved performance over the main branch, but with a slightly different stance: the single effect is for accessing the context object only, so any helpers, instead of expecting access to locals, will instead only have access to that context. The job from here will be to make sure that the context object we build for Hanami’s views has everything we need for an ergonomic experience working with our helpers. I’m feeling positive about the direction here, but it’ll be a little while before I get back to it. Read on for more on this (again!).

Unified application and slice

The biggest thing I did over this period was to unify Hanami’s Application and Slice. This one took some doing, and I was glad that I had a solid stretch of time to work on it between jobs.

I already wrote about this back in April’s update, noting that I’d settled on the approach of having a composed slice inside the Hanami::Application class to providing slice-like functionality at the application level. This was the approach I continued with, and as I went, I was able to move more and more functionality out of Hanami::Application and into Hanami::Slice, with that composed “application slice� being the thing that preserved the existing application behaviour. At some point, a pattern emerged: the application is a slice, and we could achieve everything we wanted (and more) by turning class Hanami::Application into class Hanami::Application < Hanami::Slice.

Turning the application into a slice sublcass is indeed how I finished the work, and I’m extremely pleased with how it turned out. It’s made slices so much more powerful. Now, each slice can have its own config, its own dedicated settings and routes, can be run on its own as a Rack application, and can even have its own set of child slices.

As a user of Hanami you won’t be required to use all of this per-slice power features, but they’ll be there if or when you want them. This is a great example of progressive disclosure, a principle I follow as much as possible when designing Hanami’s features: a user should be able to work with Hanami in a simple, straightforward way, and then as their needs grow, they can then find additional capabilities waiting to serve them.

Let’s explore this with a concrete example. If you’re building a simple Hanami app, you can start with a single top-level config/settings.rb that defines all of the app’s own settings. This settings object is made available as a "settings" component registration in both the app as well as all its slices. As the app grows and you add a slice or two, you start to add more slice-specific settings to this component. At this point you start to feel a little uncomfortable that settings specific to SliceA are also available inside SliceB and elsewhere. So you wonder, could you go into slices/slice_a/ and drop a dedicated config/settings.rb there? The answer to that is now yes! Create a config/settings.rb inside any slice directory and it will now become a dedicated settings component for that slice alone. This isn’t a detail you had to burden yourself with in order to get started, but it was ready for you when you needed it.

Another big benefit of this code reorganisation is that the particular responsibilities of Hanami::Application are much clearer: its job is to provide the single entrypoint to the app and coordinate the overall boot process; everything else comes as part of it also being a slice. This distinction is made clear through the number of public methods that exist across the two classes: Application now has only 2 distinct public methods, whereas Slice currently brings 27.

There’s plenty more detail over in the pull request: go check it out!

The work here also led to changes across the ecosystem:

This is one the reasons I’m excited about Hanami’s use of the dry-rb gems: it’s pushing them in directions no one has had to take them before. The result is not only the streamlined experience we want for Hanami, but also vastly more powerful underpinnings.

Devised a slimmed down core app structure

While I had my head down working on internal changes like the above, Luca had been thinking about Hanami 2 adoption and the first run user experience. As we had opted for a slices-only approach for the duration of our alpha releases, it meant a fairly bulky overall app structure: every slice came with multiple deeply nested files. This might be overwhelming to new users, as well as feeling like overkill for apps that are intended to start small and stay small.

To this end, we agreed upon a stripped back starter structure. Here’s how it looks at its core (ignoring tests and other general Ruby files):

├── app/
│   ├── action.rb
│   └── actions/
├── config/
│   ├── app.rb
│   ├── routes.rb
│   └── settings.rb
├── config.ru
└── lib/
    ├── my_app/
    │   └── types.rb
    └── tasks/

That’s it! Much more lightweight. This approach takes advantage of the Hanami app itself becoming a fully-featured slice, with app/ now as its source directory.

In fact, I took this opportunity to unify the code loading rules for both the app and slices, which makes for a much more intuitive experience. You can now drop any ruby source file into app/ or a slices/[slice_name]/ slice dir and it will be loaded in the same way: starting at the root of each directory, classes defined therein are expected to inhabit the namespace that the app or slice represents, so app/some_class.rb would be MyApp::SomeClass and slices/my_slice/some_class would be MySlice::SomeClass. Hat tip to me of September 2021 for implementing the dry-system namespaces feature that enabled this! 😜

(Yet another little dry-system tweak came out of preparing this too, with Component#file_name now exposed for auto-registration rules).

This new initial structure for starter Hanami 2.0 apps is another example of progressive disclosure in our design. You can start with a simple all-in-one approach, everything inside an app/ directory, and then as various distinct concerns present themselves, you can extract them into dedicated slices as required.

Along with this, some of our names have become shorter! Yes, “application� has become “app� (and Hanami::Application has become Hanami::App, and so on). These shorter names are easier to type, as well as more reflective of the words we tend to use when verbally describing these structures.

We also tweaked our actions and views integration code so that it is automatically available when you inherit directly from Hanami::Action, so it will no longer be necessary to have the verbose Hanami::Application::Action as the superclass for the app’s actions. We also ditched that namespace for both routes and settings too, so now you can just inherit from Hanami::Settings and the like.

Devised a slimmed down release strategy

Any of you following my updates would know by now that the Hanami 2.0 release has been a long time coming. We have ambitious goals, we’re doing our best, and everything is slowly coming together. But as hard as it might’ve been for folks who’re waiting, it’s been doubly so for us, feeling the weight of both the work along with everyone’s expectations.

So to make sure we can focus our efforts and get something out the door sooner rather than later, we decided to stagger our 2.0 release. We’ll start off with an initial 2.0 release centred around hanami, hanami-cli, hanami-controller, and hanami-router (enough to write some very useful API applications, for example), then follow up with a “full stack� 2.1 release including database persistence, views, helpers, assets and everything else.

I’m already feeling empowered by this strategy: 2.0 feels actually achievable now! And all of the other release-related work like updated docs and a migration guide will become correspondingly easier too.

Released Hanami 2.0.0.beta1!

With greater release clarity as well as all the above improvements under our belt, it was time to usher in a new phase of Hanami 2.0 development, so we released 2.0.0.beta1 in July! This new version suffix represents just how close we feel we are to our final vision for 2.0. This is an exciting moment!

And a bunch more

This update is getting rather long, so let me list a bunch of other Hanami improvements I managed to get done:

Outside my Hanami development, a new job and a new computer meant I also took the change to reboot my dotfiles, which are now powered by chezmoi. I can’t speak highly enough of chezmoi, it’s an extremely powerful tool and I’m loving the flexibility it affords!

That’s it from me for now. I’ll come back to you all in another month!

,

Ian BrownHigh Velocity Migrations with GCVE and HCX

What is HCX? VMware HCX is an application mobility platform designed for simplifying application migration, workload rebalancing and business continuity across datacenters and clouds. VMware HCX was formerly known as Hybrid Cloud Extension and NSX Hybrid Connect. GCVE HCX GCVE deploys the Enterprise version of HCX as part of the cost of the solution. HCX Enterprise has the following benefits: Hybrid Interconnect WAN Optimisation Bulk Migration, Live Migration and HCX Replication Assisted vMotion Cloud to cloud migration Disaster Protection KVM & Hyper-V to vSphere migrations Traffic Engineering Mobility Groups Mobility Optimised Networking Changeover scheduling Definitions Cold Migration

,

Ian BrownInfrastructure as Code with Terraform in GCVE

We have seen a lot of Google Cloud VMware Engine over the last few months and for the entire time we have used click-ops to provision new infrastructure, networks and VM’s. Now we are going to the next level and we will be using Terraform to manage our infrastructure as code so that it is version controlled and predictable. Installing Terraform The first part of getting this working is installing Terraform on your local machine.

,

Tim SerongHack Week 21: Keeping the Battery Full

As described in some detail in my last post, we have a single 10kWh Redflow ZCell zinc bromine flow battery hooked up to our solar PV via Victron inverter/chargers. This gives us the ability to:

  • Store almost all the excess energy we generate locally for later use.
  • When the sun isn’t shining, grid charge the battery at off-peak times then draw it down at peak times to save on our electricity bill (peak grid power is slightly more than twice as expensive as off-peak grid power).
  • Opportunistically survive grid outages, provided they don’t happen at the wrong time (i.e. when the sun is down and the battery is at 0% state of charge).

By their nature, ZCell flow batteries needs to undergo a maintenance cycle at least every three days, where they are discharged completely for a few hours. That’s why the last point above reads “opportunistically survive grid outages”. With a single ZCell, we can’t use the “minimum state of charge” feature of the Victron kit to always keep some charge in the battery in case of outages, because doing so conflicts with the ZCell maintenance cycles. Once we eventually get a second battery, this problem will go away because the maintenance cycles automatically interleave. In the meantime though, as my project for Hack Week 21, I decided to see if I could somehow automate the Victron scheduled charge configuration based on the ZCell maintenance cycle timing, to always keep the battery as full as possible for as long as possible.

There are three goals somewhat in tension with each other here:

  • Keep the battery full, except during maintenance cycles.
  • Don’t let the battery get too full immediately before a maintenance cycle, lest the discharge take too long and maintenance still be active the following morning.
  • Don’t schedule charges during peak electricity times (we still want to draw the battery down then, to avoid using the expensive gold plated electrons the power company sends down the wire between 07:00-10:00 and 16:00-21:00).

Here’s the solution I came up with:

  • On non-maintenance cycle days, set two no-limit scheduled charges, one from 10:00 for 6 hours, the other from 21:00 for 10 hours. That means the battery will be charged from the grid and/or the sun continuously, except for peak electricity times, when it will be drawn down. Our loads aren’t high enough to completely deplete the battery during peak times, so there will always be some juice in case of a grid outage on non-maintenance cycle days.
  • On maintenance cycle days, set a 50% limit scheduled charge from 13:00 for 3 hours, so the battery won’t be too full before that evening’s maintenance cycle, which kicks in at sunset. The day after a maintenance cycle, set a no limit scheduled charge from 03:00 for 4 hours. At our site, maintenance has almost always finished before 03:00, so there’s no conflict here, and we still have time to get some charge into the battery to handle the next morning’s peak.

Now, how to automate that?

The ZCell Battery Management System (BMS) has a REST API which we can query to find out useful information about the battery. Unfortunately it won’t actually tell us for certain whether maintenance will be run on any given day, but we can get the maintenance time limit, and subtract from that the amount of time that’s passed since the last maintenance cycle. If the resultant figure is less than one day, we know that maintenance will happen today. It is possible for maintenance to happen at other times, e.g. I can force maintenance manually, and also it can happen more often than every three days if you mess with the allowed days setting in the BMS, so this solution arguably isn’t perfect, but I think it’s good enough under the circumstances, at least at our site.

The Victron Cerbo GX (the little box that controls everything) runs Linux, and you can easily get root on it, so it’s possible to write scripts that run locally there. Here’s what I ended up with:

One important point about installing things on the Cerbo GX, is that the root partition is overwritten during firmware updates, but there’s a separate data partition which is preserved. The root user’s home directory is symlinked to /data/home/root, so my script lives at /data/home/root/sched.py to ensure it remains present. Then we need to get it into /etc/crontab, which doesn’t survive firmware updates. This is done by adding a /data/rc.local script which the Cerbo GX runs on boot:

After a few days of testing and observation, I can confirm that it all works perfectly! At least, at our site, right now, with our current loads and daylight ours. The whole thing will want revisiting (or probably just turning off) as we get into summer, when we’ll be able to rely on significantly more sunlight to keep the battery full than we get now. I may well just go back to a single 03:00-for-four-hours grid charge then, once the days are nice and long. See how we go…

,

Tim RileyJoining Buildkite, and sticking with Ruby

Last week I finished up at Culture Amp, and I’m excited to announce that I’ll be joining Buildkite as an engineer!

My time at Culture Amp was special. It was my first role after a decade of running Icelab with Max and Michael. Culture Amp hired everyone at Icelab after we decided to close the business, providing both a smooth transition and new opportunities to a singular group. I built a great working relationship with my manager, I was trusted to do big things, and I relished the chance to work with and learn from a large group of engineers. I’m deeply thankful for all of this.

Towards the end, I was serving as Culture Amp’s Director of Back End Engineering, and moving into engineering management. However, as any astute reader of this blog might attest, I am deeply motivated by hands on programming work, and all the learning and collaboration opportunities that go with it. I realised it was not the time to draw that chapter to a close (it might never!), and through that consideration I connected with Buildkite.

I’m excited to join Buildkite for many reasons! It’s a great Australian company with heart and personality. It brims with people I’ve long dreamt of working with. Developer tooling is an area close to my heart. And they’re growing a (majestic) Ruby app at the core of their tech. I can’t wait to dig in.

For me, this is also an intentional decision to stick with Ruby. The work I’m doing in Ruby OSS right now might be one of the biggest “dents in the universe” I get to make. I want to see this effort through, to complete our vision for Hanami 2.0, then learn from how it’s adopted by our community.

I have some time off between jobs, which I’ll use to give our Hanami work a real boost: I’ll be commiting nearly 6 weeks of full-time work to Hanami! Based on previous experience, this should see me get through what otherwise might have taken 6 months of part-time effort. I’m hoping this will get us significantly closer to 2.0. I’ll likely start another tweet thread of my efforts, so find me on Twitter if you’d like to follow along!

,

Ian BrownGCVE Backup and Disaster Recovery

Picking up where we left off last month, let’s dive into disaster recovery and how to use Site Recovery Manager and Google Backup & Protect to DR into and within the cloud with GCVE. But before we do, a quick advertisement: If you are in Brisbane, Australia, I suggest coming to the awesome Google Infrastructure Group (GIG) which focuses on GCVE where on 04 July 2022 I will be presenting on Terraform in GCVE.

,

Ian BrownGCVE Advanced Auto-Scaling

Let’s pick up where we left off from last months article and start setting up some of the features of GCVE, starting with Advanced Autoscaling. What is Advanced Auto-Scaling? Advanced Autoscaling automatically expands or shrinks a private cloud based on CPU, memory and storage utilisation metrics. GCVE monitors the cluster based on the metrics defined in the autoscale policy and decides to add or remove nodes automatically. Remember: GCVE is physical Dell Poweredge servers, not a container/VM running in Docker or on a hypervisor like VMware.

,

BlueHackersFree psychologist service at conferences: April 2022 update

We’ve done this a number of times over the last decade, from OSDC to LCA. The idea is to provide a free psychologist or counsellor at an in-person conference. Attendees can do an anonymous booking by taking a stickynote (with the timeslot) from a signup sheet, and thus get a free appointment.

Many people find it difficult taking the first (very important) step towards getting professional help, and we’ve received good feedback that this approach indeed assists.

So far we’ve always focused on open source conferences. Now we’re moving into information security! First BrisSEC 2022 (Friday 29 April at the Hilton in Brisbane, QLD) and then AusCERT 2022 (10-13 May at the Star Hotel, Gold Coast QLD). The awesome and geek friendly Dr Carla Rogers will be at both events.

How does this get funded? Well, we’ve crowdfunded some, nudged sponsors, most mostly it gets picked up by the conference organisers (aka indirectly by the sponsors, mostly).

If you’re a conference organiser, or would like a particular upcoming conference to offer this service, do drop us a line and we’re happy to chase it up for you and help the organisers to make it happen. We know how to run that now.

In-person is best. But for virtual conferences, sure contact us as well.

The post Free psychologist service at conferences: April 2022 update first appeared on BlueHackers.org.

,

FLOSS Down Under - online free software meetingsApril Hack Day Report

The hack day didn’t go as well as I hoped, but didn’t go too badly. There was smaller attendance than hoped and the discussion was mostly about things other than FLOSS. But everyone who attended had fun and learned interesting things so generally I think it counts as a success. There was discussion on topics including military hardware, viruses (particularly Covid), rocketry, and literature. During the discussion one error in a Wikipedia page was discussed and hopefully we can get that fixed.

I think that everyone who attended will be interested in more such meetings. Overall I think this is a reasonable start to the Hack Day meetings, when I previously ran such meetings they often ended up being more social events than serious hacking events and that’s OK too.

One conclusion that we came to regarding meetings is that they should always be well announced in email and that the iCal file isn’t useful for everyone. Discussion continues on the best methods of announcing meetings but I anticipate that better email will get more attendance.

,

Ian BrownIntroduction to GCVE

What is GCVE? Google Cloud VMware Engine, or GCVE, is a fully managed VMware hypervisor and associated management and networking components, (vSphere, NSX-T, vSAN and HCX) built on top of Google’s highly performant and scalable infrastructure with fully redundant and dedicated 100Gbps networking that provides 99.99% availability. The solution is integrated into Google Cloud Platform, so businesses benefit from having full access to GCP services, native VPC networking, Cloud VPN or Interconnect as well as all the normal security features you expect from GCP.

,

FLOSS Down Under - online free software meetingsMarch 2022 Meeting

Meeting Report

The March 2022 meeting went reasonably well. Everyone seemed to have fun and learn useful things about computers. After 2 hours my Internet connection dropped out which stopped the people who were using VMs from doing the tutorial. Fortunately most people seemed ready for a break so we ended the meeting. The early and abrupt ending of the meeting was a disappointment but it wasn’t too bad, the meeting would probably only have gone for another half hour otherwise.

The BigBlueButton system was shown to be effective for training when one person got confused with the Debian package configuration options for Postfix and they were able to share the window with everyone else to get advice. I was also confused by that stage.

Future Meetings

The main feature of the meeting was training in setting up a mailserver with Postfix, here are the lecture notes for it [1]. The consensus at the end of the meeting was that people wanted more of that for the April meeting. So for the April meeting I will add to the Postfix Training to include SpamAssassin, SPF, DKIM, and DMARC. For the start of the next meeting instead of providing bare Debian installations for the VMs I’ll provide a basic Postfix/Dovecot setup so people can get straight into SpamAssassin etc.

For the May meeting training on SE Linux was requested.

Social Media

Towards the end of the meeting we discussed Matrix and federated social media. LUV has a Matrix server and I can give accounts to anyone who’s involved in FOSS in the Australia and New Zealand area. For Mastodon the NZOSS Mastodon server [2] seems like a good option. I have an account there to try Mastodon, my Mastodon address is @etbe@mastodon.nzoss.nz .

We are going to make Matrix a primary communication method for the Flounder group, the room is #flounder:luv.asn.au . My Matrix address is @etbe:luv.asn.au .

,

FLOSS Down Under - online free software meetingsMailing List

We now have a mailing list see https://lists.linux.org.au/mailman/listinfo/flounder for information, the address to post to the list is flounder@lists.linux.org.au..

We also have a new URL for the blog and events. See the right sidebar for the link to the iCal file which can be connected to Google Calendar and most online calendaring systems.

,

FLOSS Down Under - online free software meetingsFirst Meeting Success

We just had the first Flounder meeting which went well. Had some interesting discussion of storage technology, I learnt a few new things. Some people did the ZFS training and BTRFS training and we had lots of interesting discussion.

Andrew Pam gave a summary of new things in Linux and talked about the sites lwn.net, gamingonlinux.com, and cnx-software.com that he uses to find Linux news. One thing he talked about is the latest developments with SteamDeck which is driving Linux support in Steam games. The site protondb.com tracks Linux support in Steam games.

We had some discussion of BPF, for an introduction to that technology see the BPF lecture from LCA 2022.

Next Meeting

The next meeting (Saturday 5th of March 1PM Melbourne time) will focus on running your own mail server which is always of interest to people who are interested in system administration and which is probably of more interest than usual because of Google forcing companies with “a legacy G Suite subscription” to transition to a more expensive “Business family” offering.

,

Stewart SmithAdventures in the Apple Partition Map (Part 2 of the continuing adventures with the Apple Power Macintosh 7200/120 PC Compatible)

I “recently” wrote about obtaining a new (to me, actually quite old) computer over in The Apple Power Macintosh 7200/120 PC Compatible (Part 1). This post is a bit of a detour, but may help others understand why some images they download from the internet don’t work.

Disk partitioning is (of course) a way to divide up a single disk into multiple volumes (partitions) for different uses. While the idea is similar, computer platforms over the ages have done this in a variety of different ways, with varying formats on disk, and varying limitations. The ones that you’re most likely to be familiar with are the MBR partitioning scheme (from the IBM PC), and the GPT partitioning scheme (common for UEFI systems such as the modern PC and Mac). One you’re less likely to be familiar with is the Apple Partition Map scheme.

The way all IBM PCs and compatibles worked from the introduction of MS-DOS 2.0 in 1983 until some time after 2005 was the Master Boot Record partitioning scheme. It was outrageously simple: of the first 512 byte sector of a disk, the first 446 bytes was for the bootstrapping code (the “boot sector”), the last 2 bytes were for the magic two bytes telling the BIOS this disk was bootable, and the other 64 bytes were four entries of 16 bytes, each describing a disk partition. The Wikipedia page is a good overview of what it all looks like. Since “four partitions should be enough for anybody” wasn’t going to last, DOS 3.2 introduced “extended partitions” which was just using one of those 4 partitions as another similar data structure that could point to more partitions.

In the 1980s (similar to today), the Macintosh was, of course, different. The Apple Partition Map is significantly more flexible than the MBR on PCs. For a start, you could have more than four partitions! You could actually have a lot more than four partitions, as the Apple Partition Map is a single 512-byte sector for each partition, and the partition map is itself a partition. Instead of being block 0 (like the MBR is), it actually starts at block 1, and is contiguous (The Driver Descriptor Record is what’s at block 0). So, once created, it’s hard to extend. Typically it’d be created as 64×512-byte entries, for 32kb… which turns out is actually about enough for anyone.

The Inside Macintosh reference on the SCSI Manager goes through more detail as to these structures. If you’re wondering what language all the coding examples are in, it’s Pascal – which was fairly popular for writing Macintosh applications in back in the day.

But the actual partition map isn’t the “interesting” part of all this (and yes, the quotation marks are significant here), because Macs are pretty darn finicky about what disks to boot off, which gets to be interesting if you’re trying to find a CD-ROM image on the internet from which to boot, and then use to install an Operating System from.

Stewart SmithEvery time I program a Mac…

… the preferred programming language changes.

I never programmed a 1980s Macintosh actually in the 1980s. It was sometime in the early 1990s that I first experienced Microsoft Basic for the Macintosh. I’d previously (unknowingly at the time as it was branded Commodore) experienced Microsoft BASIC on the Commodore 16, Commodore 64, and even the Apple ][, but the Macintosh version was something else. It let you do some pretty neat things such as construct a GUI with largely the same amount of effort as it took to construct a Text based UI on the micros I was familiar with.

Okay, to be fair, I’d also dabbled in Microsoft QBasic that came bundled with MS-DOS of the era, which let you do a whole bunch of graphics – so you could theoretically construct a GUI with it. Something I did attempt to do. Programming on the Mac was so much easier to construct a GUI.

Of course, Microsoft Basic wasn’t the preferred way to program on the Macintosh. At that time it was largely Pascal, with C being something that also existed – but you were going to see Pascal in Inside Macintosh. It was probably somewhat fortuitous that I’d poked at Pascal a bit as something alternate to look at in the high school computing classes. I can only remember using TurboPascal on DOS systems and never actually writing Pascal on the Macintosh.

By the middle part of the 1990s though, I was firmly incompetently writing C on the Mac. No doubt the quality of my code increased after I’d done some university courses actually covering the language rather than the only practical way I had to attempt to write anything useful being looking at Inside Macintosh examples in Pascal and “C for Dummies” which was very not-Macintosh. Writing C on UNIX/Linux was a lot easier – everything was made for it, including Actual Documentation!

Anyway, in the early 2000s I ran MacOS X for a bit on my white iBook G3, and did a (very) small amount of any GUI / Project Builder (the precursor to Xcode) related development – instead largely focusing on command line / X11 things. The latest coolness being to use Objective-C to program applications (unless you were bringing over your Classic MacOS Carbon based application, then you could still write C). Enter some (incompetent) Objective-C coding!

Then Apple went to x86, so the hardware ceased being interesting, and I had no reason to poke at it even as a side effect of having hardware that could run the software stack. Enter a long-ass time of Debian, Ubuntu, and Fedora on laptops.

Come 2022 though, and (for reasons I should really write up), I’m poking at a Mac again and it’s now Swift as the preferred way to write apps. So, I’m (incompetently) hacking away at Swift code. I have to admit, it’s pretty nice. I’ve managed to be somewhat productive in a relative short amount of time, and all the affordances in the language gear towards the kind of safety that is a PITA when coding in C.

So this is my WIP utility to be able to import photos from a Shotwell database into the macOS Photos app:

There’s a lot of rough edges and unknowns left, including how to actually do the import (it looks like there’s going to be Swift code doing AppleScript things as the PhotoKit API is inadequate). But hey, some incompetent hacking in not too much time has a kind-of photo browser thing going on that feels pretty snappy.

,

Robert Collinshyper combinators in Rust

Recently I read Michael Snoyman’s post on combining Axum, Hyper, Tonic and Tower. While his solution worked, it irked me – it seemed like there should be a much tighter solution possible.

I can deep dive into the code in a later post perhaps, but I think there are four points of difference. One, since the post was written Axum has started boxing its routes : so the enum dispatch approach taken, which delivers low overheads actually has no benefits today.

Two, while writing out the entire type by hand has some benefits, async code is much more pithy.

Thirdly, the code in the post is entirely generic, except the routing function itself.

And fourth, the outer Service<AddrStream> is an unnecessary layer to abstract over: given the similar constraints – the inner Service must take Request<..>, it is possible to just not use a couple of helpers and instead work directly with Service<Request...>.

So, onto a pithier version.

First, the app server code itself.

use std::{convert::Infallible, net::SocketAddr};

use axum::routing::get;
use hyper::{server::conn::AddrStream, service::make_service_fn};
use hyper::{Body, Request};
use tonic::async_trait;

use demo::echo_server::{Echo, EchoServer};
use demo::{EchoReply, EchoRequest};

struct MyEcho;

#[async_trait]
impl Echo for MyEcho {
    async fn echo(
        &self,
        request: tonic::Request<EchoRequest>,
    ) -> Result<tonic::Response<EchoReply>, tonic::Status> {
        Ok(tonic::Response::new(EchoReply {
            message: format!("Echoing back: {}", request.get_ref().message),
        }))
    }
}

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 3000));

    let axum_service = axum::Router::new().route("/", get(|| async { "Hello world!" }));

    let grpc_service = tonic::transport::Server::builder()
        .add_service(EchoServer::new(MyEcho))
        .into_service();

    let both_service =
        demo_router::Router::new(axum_service, grpc_service, |req: &Request<Body>| {
            Ok::<bool, Infallible>(
                req.headers().get("content-type").map(|x| x.as_bytes())
                    == Some(b"application/grpc"),
            )
        });

    let make_service = make_service_fn(move |_conn: &AddrStream| {
        let both_service = both_service.clone();
        async { Ok::<_, Infallible>(both_service) }
    });

    let server = hyper::Server::bind(&addr).serve(make_service);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

Note the Router: it takes the two services and Fn to determine which to use on any given request. Then we just drop that composed service into make_service_fn and we’re done.

Next up we have the Router implementation. This is generic across any two Service<Request<...>> types as long as they are both Into<Bytes> for their Data, and Into<Box<dyn Error>> for errors.

use std::{future::Future, pin::Pin, task::Poll};

use http_body::combinators::UnsyncBoxBody;
use hyper::{body::HttpBody, Body, Request, Response};
use tower::Service;

#[derive(Clone)]
pub struct Router<First, Second, F> {
    first: First,
    second: Second,
    discriminator: F,
}

impl<First, Second, F> Router<First, Second, F> {
    pub fn new(first: First, second: Second, discriminator: F) -> Self {
        Self {
            first,
            second,
            discriminator,
        }
    }
}

impl<First, Second, FirstBody, FirstBodyError, SecondBody, SecondBodyError, F, FErr>
    Service<Request<Body>> for BinaryRouter<First, Second, F>
where
    First: Service<Request<Body>, Response = Response<FirstBody>>,
    First::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    First::Future: Send + 'static,
    First::Response: 'static,
    Second: Service<Request<Body>, Response = Response<SecondBody>>,
    Second::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    Second::Future: Send + 'static,
    Second::Response: 'static,
    F: Fn(&Request<Body>) -> Result<bool, FErr>,
    FErr: Into<Box<dyn std::error::Error + Send + Sync>> + Send + 'static,
    FirstBody: HttpBody<Error = FirstBodyError> + Send + 'static,
    FirstBody::Data: Into<bytes::Bytes>,
    FirstBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    SecondBody: HttpBody<Error = SecondBodyError> + Send + 'static,
    SecondBody::Data: Into<bytes::Bytes>,
    SecondBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
{
    type Response = Response<
        UnsyncBoxBody<
            <hyper::Body as HttpBody>::Data,
            Box<dyn std::error::Error + Send + Sync + 'static>,
        >,
    >;
    type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
    type Future =
        Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send + 'static>>;

    fn poll_ready(
        &mut self,
        cx: &mut std::task::Context<'_>,
    ) -> std::task::Poll<Result<(), Self::Error>> {
        match self.first.poll_ready(cx) {
            Poll::Ready(Ok(())) => match self.second.poll_ready(cx) {
                Poll::Ready(Ok(())) => Poll::Ready(Ok(())),
                Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
                Poll::Pending => Poll::Pending,
            },
            Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
            Poll::Pending => Poll::Pending,
        }
    }

    fn call(&mut self, req: Request<Body>) -> Self::Future {
        let discriminant = { (self.discriminator)(&req) };
        let (first, second) = if matches!(discriminant, Ok(false)) {
            (Some(self.first.call(req)), None)
        } else if matches!(discriminant, Ok(true)) {
            (None, Some(self.second.call(req)))
        } else {
            (None, None)
        };
        let f = async {
            Ok(match discriminant.map_err(Into::into)? {
                true => second
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
                false => first
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
            })
        };
        Box::pin(f)
    }
}

Interesting things here – I use boxed_unsync to abstract over the body concrete type, and I implement the future using async code rather than as a separate struct. It becomes much smaller even after a few bits of extra type constraining.

One thing that flummoxed me for a little was the need to capture the future for the underlying response outside of the async block. Failing to do so provokes a 'static requirement which was tricky to debug. Fortunately there is a bug on making this easier to diagnose in rustc already. The underlying problem is that if you create the async block, and then dereference self, the type for impl of .first has to live an arbitrary time. Whereas by capturing the future immediately, only the impl of the future has to live an arbitrary time, and that doesn’t then require changing the signature of the function.

This is almost worth turning into a crate – I couldn’t see an existing one when I looked, though it does end up rather small – < 100 lines. What do you all think?

FLOSS Down Under - online free software meetingsFirst Meeting Agenda

The first meeting will start at 1PM Australian Eastern time (Melbourne/Sydney) which is +1100 on Saturday the 5th of February.

I will start the video chat an hour early in case someone makes a timezone mistake and gets there an hour before it starts. If anyone else joins early we will have random chat until the start time (deliberately avoiding topics worthy of the main meeting). The link http://b.coker.com.au will redirect to the meeting URL on the day.

The first scheduled talk is a summary and discussion of free software related news. Anyone who knows of something new that excites them is welcome to speak about it.

The main event is discussion of storage technology and hands-on training on BTRFS and ZFS for those who are interested. Here are the ZFS training notes and here are the BTRFS training notes. Feel free to do the training exercises on your own VM before the meeting if you wish.

Then discussion of the future of the group and the use of FOSS social media. While social media is never going to be compulsory some people will want to use it to communicate and we could run some servers for software that is considered good (lots of server capacity is available).

Finally we have to plan future meetings and decide on which communication methods are desired.

The BBB instance to be used for the video conference is sponsored by NZOSS and Catalyst Cloud.

,

FLOSS Down Under - online free software meetingsFlounder Overview

Flounder is a new free software users group based in the Australia/NZ area. Flounder stands for FLOSS (Free Libre Open Source Software) down under.

Here is my blog post describing the initial idea, the comment from d3Xt3r suggested the name. Flounder is a group of fish that has species native to Australia and NZ.

The main aim is to provide educational benefits to free software users via an online meeting that can’t be obtained by watching YouTube videos etc in a scope that is larger than one country. When the pandemic ends we will keep running this as there are benefits to be obtained from a meeting of a wide geographic scope that can’t be obtained by meetings in a single city. People from other countries are welcome to attend but they aren’t the focus of the meeting.

Until we get a better DNS name the address http://b.coker.com.au will redirect to the BBB instance used for online meetings (the meeting address isn’t yet setup so it redirects to the blog). The aim is that there will always be a short URL for the meeting so anyone who has one device lose contact can quickly type the URL into their backup device.

The first meeting will be on the 5th of Feb 2022 at 1PM Melbourne time +1100. When we get a proper domain I’ll publish a URL for an iCal file with entries for all meetings. I will also find some suitable way for meeting times to be localised (I’m sure there’s a WordPress plugin for that).

For the hands-on part of the meetings there will be virtual machine images you can download to run on your own system (tested with KVM, should work with other VM systems) and the possibility of logging in to a running VM. The demonstration VMs will have public IPv6 addresses and will also be available through different ports on a single IPv4 address, having IPv6 on your workstation will be convenient for you but you can survive without it.

Linux Australia has a list of LUGs in Australia, is there a similar list for NZ? One thing I’d like to see is a list of links for iCal files for all the meetings and also an iCal aggregator that for all iCal feeds of online meetings. I’ll host it myself if necessary, but it’s probably best to do it via Linux Australia (Linux Australasia?) if possible.

,

Jan SchmidtPulling on a thread

I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.

I found something boneheaded doing that today.

There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.

In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.

To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:

Pose recognition latency.
dt = interpose spacing, delay = frame to pose latency

Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.

Neither of those should be happening.

Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.

Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.

OpenHMD Rift CV1 tracking timing

This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!

That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.

This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.

There are 3 threads involved:

  • USB service thread, reading video frame packets and assembling pixels in framebuffers
  • Fast analysis thread, that checks tracking lock is still acquired
  • Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs

These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:

while driver_running:
    Pop a frame from the queue
    Process the frame
    Sleep for new frame notification

The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!

The fix is simple:

while driver_running:
   Pop a frame
   Process the frame
   if queue_is_empty():
     sleep for new frame notification

Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.

Pose latency and inter-pose spacing after fix

This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.

What does this mean in practice?

The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.

Before: Left Controller pose delays by sensor
After: Left Controller pose delays by sensor

Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.

You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.

The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):

Before: Left Controller poses
After: Left Controller poses

,

Colin CharlesThis thing is still on?

Yes, the blog is still on. January 2004 I moved to WordPress, and it is still here January 2022. I didn’t write much last year (neither here, not experimenting with the Hey blog). I didn’t post anything to Instagram last year either from what I can tell, just a lot of stories.

August 16 2021, I realised I was 1,000 days till May 12 2024, which is when I become 40. As of today, that leads 850 days. Did I squander the last 150 days? I’m back to writing almost daily in the Hobonichi Techo (I think last year and the year before were mostly washouts; I barely scribbled anything offline).

I got a new Apple Watch Series 7 yesterday. I can say I used the Series 4 well (79% battery life), purchased in the UK when I broke my Series 0 in Edinburgh airport.

TripIt stats for last year claimed 95 days on the road. This is of course, a massive joke, but I’m glad I did get to visit London, Lisbon, New York, San Francisco, Los Angeles without issue. I spent a lot of time in Kuantan, a bunch of Langkawi trips, and also, I stayed for many months at the Grand Hyatt Kuala Lumpur during the May lockdowns (I practically stayed there all lockdown).

With 850 days to go till I’m 40, I have plenty I would like to achieve. I think I’ll write a lot more here. And elsewhere. Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.

,

Jan Schmidt2.5 years of Oculus Rift

Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!

In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.

Outlier Detection

Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.

It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.

Plot of headset tracking – orientation and position

The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.

Left Touch controller tracking – orientation and position

This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).

Improved Pose Priors

Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.

This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.

LED visibility model

To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.

Better Smoothing

I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.

Improved Display Distortion Correction

I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.

Persistent Config Cache

Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.

Persistent Room Configuration

To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.

Video Capture + Processing

The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.

Session Simulator

I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.

To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):

export OHMD_TRACE_DIR=/home/user/openhmd-traces/
export OHMD_FULL_RECORDING=1

Up Next

The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.

I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.

One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!

As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.

Thank you to the kind people signed up as Github Sponsors for this project!

,

Glen TurnerThe tyranny of product names

For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").

It's time we pushed back, and wrote for our readers, not for corporations.

It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".

So from here on in I'll be using the following:

  • Face Book. Formerly, "Facebook®".
  • Junos. Formerly JUNOS®.
  • ram. Formerly RAM.
  • Pan OS. Formerly PAN-OS®.
  • Unix. Formerly UNIX®.

I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.



comment count unavailable comments

,

Chris NeugebauerTalk Notes: On The Use and Misuse of Decorators

I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:

Simple @property implementation

This shows a demo of @property-style getters. Setters are left as an exercise :)


def demo_property(f):
    f.is_a_property = True
    return f


class HasProperties:

    def __getattribute__(self, name):
        ret = super().__getattribute__(name)
        if hasattr(ret, "is_a_property"):
            return ret()
        else:
            return ret

class Demo(HasProperties):

    @demo_property
    def is_a_property(self):
        return "I'm a property"

    def is_a_function(self):
        return "I'm a function"


a = Demo()
print(a.is_a_function())
print(a.is_a_property)

@run (The Scoped Block)

@run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.

def run(f):
    return f()

@run
def hello_world():
    return "Hello, World!"

print(hello_world)

@apply (Multi-line stream transformers)

def apply(transformer, iterable_):

    def _applicator(f):

        return(transformer(f, iterable_))

    return _applicator

@apply(map, range(100)
def fizzbuzzed(i):
    if i % 3 == 0 and i % 5 == 0:
        return "fizzbuzz"
    if i % 3 == 0:
        return "fizz"
    elif i % 5 == 0:
        return "buzz"
    else:
        return str(i)

Builders


def html(f):
    builder = HtmlNodeBuilder("html")
    f(builder)
    return builder.build()


class HtmlNodeBuilder:
    def __init__(self, tag_name):
       self.tag_name = tag_name
       self.nodes = []

   def node(self, f):
        builder = HtmlNodeBuilder(f.__name__)
        f(builder)
        self.nodes.append(builder.build())

    def text(self, text):
        self.nodes.append(text)

    def build(self):
      nodes = "\n".join(self.nodes)
       return f"<{self.tag_name}>\n{nodes}\n</{self.tag_name}>"


@html
def document(b):
   @b.node
   def head(b):
       @b.node
       def title(b):
           b.text("Hello, World!")

   @b.node
   def body(b):
       for i in range(10, 0, -1):
           @b.node
           def p(b):
               b.text(f"{i}")

Code Registries

This is an incomplete implementation of a code registry for handling simple text processing tasks:

```python

def register(self, input, output):

def _register_code(f):
    self.registry[(input, output)] = f
    return f

return _register_code

in_type = (iterable[str], (WILDCARD, ) out_type = (Counter, (WILDCARD, frequency))

@registry.register(in_type, out_type) def count_strings(strings):

return Counter(strings)

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, lowercase)) ) def words_to_lowercase(words): …

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, no_punctuation)) ) def words_without_punctuation(words): …

def find_steps( self, input_type, input_attrs, output_type, output_attrs ):

hand_wave()

def give_me(self, input, output_type, output_attrs):

steps = self.find_steps(
    type(input), (), output_type, output_attrs
)

temp = input
for step in steps:
    temp = step(temp)

return temp

,

Jan SchmidtOpenHMD update

A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.

The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:

  • Adding velocity and acceleration API to OpenHMD.
  • Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.

Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.

The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.

Ongoing work – record and replay

My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.

Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.

Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.

I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.

To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.

Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.

Ian BrownNGINX Ingress Controller in GKE

GKE in Production - Part 2 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. In this episode, we are covering how to setup a nginx ingress controller to handle incoming requests. Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site.

,

Robert CollinsA moment of history

I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.

We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.

The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.

I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.

And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.

Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.


From: Mark Shuttleworth <xxx@xxx>
To: Robert Collins <xxx@xxx>
Date: Thu, 15 Jan 2004, 04:30

Tom Lord gave me your email address, I believe he’s
already sent you the email that I sent him so I’m sure
you have some background.

In short, I am going to fund some open source
development for a year. This is part of a new project
that I will be getting off the ground in the coming
weeks. I don’t know where it will lead, it’s flying in
the face of a stiff breeze but I think at the end of
the day it will at least fund a few very good open
source developers for a full year to work on the
projects they like most.

One of the pieces of the puzzle is high end source
code management. I’ll be looking to build an
infrastructure that will manage source code for
between 100 and 8000 open source projects (yes,
there’s a big difference between the two, I don’t know
at which end of the spectrum we will be at the end of
the year but our infrastructure will have to at least
be capable of scaling to the latter within two years)
with upwards of 2000 developers, drawing code from a
variety of sources, playing with it and spitting it
out regularly in nice packages.

Arch and Subversion seem to be the two leading
contenders for “next generation open source sccm”. I’d
be interested in your thoughts on the two of them, and
how they stack up. I’m looking to hire one person who
will lead that part of the effort. They’ll work alone
from home, and be responsible for two things. First,
extending the tool (arch or svn) in ways that help the
project. Such extensions will be released under an
open source licence, and hopefully embraced by the
tools maintainers and included in the mainline code
for the tool. And second, they will be responsible for
our large-scale implementation of SCCM, using that
tool, and building the management scripts and other
infrastructure to support such a large, and hopefully
highly automated, set of repositories.

Would you be interested in this position? What
attributes and experience do you think would make you
a great person to have on the team? What would your
salary expectation be, as a monthly figure, for a one
year contract full time?

I’m currently on your continent, well, just off it. On
Lizard Island, up North. Am headed today for Brisbane,
then on the 17th to Launceston via Melbourne. If you
happen to be on any of those stops, would you be
interested in meeting up to discuss it further?

If you’re curious you can find out a bit more about me
at www.markshuttleworth.com. This project is much
lower key than some of what you’ll find there. It’s a
very long shot indeed. But if at worst all that
happens is a bunch of open source work gets funded at
my expense I’ll feel it was money well spent.

Cheers,
Mark

=====

“Good judgement comes from experience, and often experience
comes from bad judgement” – Rita Mae Brown


,

Arjen LentzClassic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The post Classic McEleice and the NIST search for post-quantum crypto first appeared on Lentz family blog.

,

Ian BrownKubenetes Basic Setup

GKE in Production - Part 1 This tutorial is part of a series I am creating on creating, running and managing Kubernetes on GCP the way I do in my day job. Note: There may be some things I have skimmed over, if so or you see a glaring hole in my configuration, please drop me a line via the contact page linked at the top of the site. What we will build In this first tutorial, we will be building a standard GKE cluster on Google Cloud Platform and deploying the hello world container to confirm everything is working.

,

Chris NeugebauerAdding a PurpleAir monitor to Home Assistant

Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.

Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.

I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.

You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.

How to use this

First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.

Adding the configuration

In HomeAssistant, add the following line to your configuration.yaml:

sensor: !include purpleair.yaml

and then add the following contents to purpleair.yaml


 - platform: rest
   name: 'PurpleAir'

   # Substitute in the URL of the sensor you care about.  To find the URL, go
   # to purpleair.com/map, find your sensor, click on it, click on "Get This
   # Widget" then click on "JSON".
   resource: https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}

   # Only query once a minute to avoid rate limits:
   scan_interval: 60

   # Set this sensor to be the AQI value.
   #
   # Code translated from JavaScript found at:
   # https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#
   value_template: >
     {{ value_json["results"][0]["Label"] }}
   unit_of_measurement: ""
   # The value of the sensor can't be longer than 255 characters, but the
   # attributes can.  Store away all the data for use by the templates below.
   json_attributes:
     - results

 - platform: template
   sensors:
     purpleair_aqi:
       unique_id: 'purpleair_SENSORID_aqi_pm25'
       friendly_name: 'PurpleAir PM2.5 AQI'
       value_template: >
         {% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}
           {{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}
         {%- endmacro %}
         {% if (states('sensor.purpleair_pm25')|float) > 1000 %}
           invalid
         {% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}
         {% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}
         {% else %}
           invalid
         {% endif %}
       unit_of_measurement: "bit"
     purpleair_description:
       unique_id: 'purpleair_SENSORID_description'
       friendly_name: 'PurpleAir AQI Description'
       value_template: >
         {% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}
           Very Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}
           Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}
           Unhealthy for Sensitive Groups
         {% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}
           Moderate
         {% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}
           Good
         {% else %}
           undefined
         {% endif %}
       entity_id: sensor.purpleair
     purpleair_pm25:
       unique_id: 'purpleair_SENSORID_pm25'
       friendly_name: 'PurpleAir PM 2.5'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['PM2_5Value'] }}"
       unit_of_measurement: "μg/m3"
       entity_id: sensor.purpleair
     purpleair_temp:
       unique_id: 'purpleair_SENSORID_temperature'
       friendly_name: 'PurpleAir Temperature'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['temp_f'] }}"
       unit_of_measurement: "°F"
       entity_id: sensor.purpleair
     purpleair_humidity:
       unique_id: 'purpleair_SENSORID_humidity'
       friendly_name: 'PurpleAir Humidity'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['humidity'] }}"
       unit_of_measurement: "%"
       entity_id: sensor.purpleair
     purpleair_pressure:
       unique_id: 'purpleair_SENSORID_pressure'
       friendly_name: 'PurpleAir Pressure'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['pressure'] }}"
       unit_of_measurement: "hPa"
       entity_id: sensor.purpleair

Quirks

I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�

,

Stewart SmithAn Unearthly Child

So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.

A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.

I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.

Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.

“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).

,

Jan SchmidtRift CV1 – Getting close now…

It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.

Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.

Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:

Beat Saber – Skunkynator playing Expert+, Mar 16 2021

Strap in. Here’s what I’ve worked on in the last 6 weeks:

Pose Matching improvements

Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.

I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.

Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.

In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.

I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.

LED labelling

When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.

The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.

IMU Gyro scale

One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.

The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.

From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.

The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.

Limited interpolation

If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.

Exponential filtering

I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.

Non-blocking radio

Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.

Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.

As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.

Haptic feedback

The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.

In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.

With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.

The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip

Problem areas

Unexpected tracking losses

I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:

To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.

Sensor fusion efficiency

The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.

At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.

Lens Distortion

To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.

Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.

Camera / USB failures

The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.

Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.

Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.

Occluded cases

Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.

If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.

Upstreaming

April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.

Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.

One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.

Others

I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.

Other Headsets

At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.

Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.

Sponsorship

This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.

,

BlueHackersWorld bipolar day 2021

Today, 30 March, is World Bipolar Day.

Vincent van Gogh - Worn Out

Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).

The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.

Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.

Have you heard of Bipolar II?

Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.

Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)

Call to action

If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.

Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.

I will be writing more about the topic in the coming time.

The post World bipolar day 2021 first appeared on BlueHackers.org.

,

Jan SchmidtRift CV1 – Testing SteamVR

Update:

This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.


I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:

  • Make sure Steam + SteamVR are already installed.
  • Clone the SteamVR-OpenHMD repository:
git clone --recursive https://github.com/ChristophHaag/SteamVR-OpenHMD.git
  • Switch the internal copy of OpenHMD to the right branch:
cd subprojects/openhmd
git remote add thaytan-github https://github.com/thaytan/OpenHMD.git
git fetch thaytan-github
git checkout -b rift-kalman-filter thaytan-github/rift-kalman-filter
cd ../../
  • Use meson to build and register the SteamVR-OpenHMD binaries. You may need to install meson first (see below):
meson -Dbuildtype=release build
ninja -C build
./install_files_to_build.sh
./register.sh
  • It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
  • Make sure your USB devices are accessible to your user account by configuring udev. See the OpenHMD guide here: https://github.com/OpenHMD/OpenHMD/wiki/Udev-rules-list
  • Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
  • It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
./build/subprojects/openhmd/openhmd_simple_example
  • Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental 🙂
  • Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.

Meson

I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.

If you need to install meson, there are instructions here – https://mesonbuild.com/Getting-meson.html summarising the various methods.

I use a copy in my home directory, but you need to make sure ~/.local/bin is in your PATH

pip3 install --user meson

,

Jan SchmidtRift CV1 – Pose rejection

I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.

In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):

  1. Do a brute-force search to match LEDs to 3D models, then (if matched)
    1. Assign labels to each LED blob in the video frame saying what LED they are.
    2. Send an update to the fusion filter about the position / orientation of the device
  2. Then, as each video frame arrives:
    1. Use motion flow between video frames to track the movement of each visible LED
    2. Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
  3. Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.

The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible

What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.

Gravity Vector matching

The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.

The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.

Partial tracking matches

The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.

With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.

While this definitely seems to help, I think the approach can use more work.

Status

With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.

I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.

For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.

Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.

After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.

,

Colin CharlesLife with Rona 2.0 – Days 4, 5, 6, 7, 8 and 9

These lack of updates are also likely because I’ve been quite caught up with stuff.

Monday I had a steak from Bay Leaf Steakhouse for dinner. It was kind of weird eating it from packs, but then I’m reminded you could do this in economy class. Tuesday I wanted to attempt to go vegetarian and by the time I was done with a workout, the only place was a chap fan shop (Leong Heng) where I had a mixture of Chinese and Indian chap fan. The Indian stall is run by an ex-Hyatt staff member who immediately recognised me! Wednesday, Alice came to visit, so we got to Hanks, got some alcohol, and managed a smorgasbord of food from Pickers/Sate Zul/Lila Wadi. Night ended very late, and on Thursday, visited Hai Tian for their famous salted egg squid and prawns in a coconut shell. Friday was back to being normal, so I grabbed a pizza from Mint Pizza (this time I tried their Aussie variant). Saturday, today, I hit up Rasa Sayang for some matcha latte, but grabbed food from Classic Pilot Cafe, which Faeeza owns! It was the famous salted egg chicken, double portion, half rice.

As for workouts, I did sign up for Mantas but found it pretty hard to do, timezone wise. I did spend a lot of time jogging on the beach (this has been almost a daily affair). Monday I also did 2 MD workouts, Tuesday 1 MD workout, Wednesday half a MD workout, Thursday I did a Ping workout at Pwrhouse (so good!), Friday 1 MD workout, and Saturday an Audrey workout at Pwrhouse and 1 MD workout.

Wednesday I also found out that Rasmus passed away. Frankly, there are no words.

Thursday, my Raspberry Pi 400 arrived. I set it up in under ten minutes, connecting it to the TV here. It “just works”. I made a video, which I should probably figure out how to upload to YouTube after I stitch it together. I have to work on using it a lot more.

COVID-19 cases are through the roof in Malaysia. This weekend we’ve seen two days of case breaking records, with today being 5,728 (yesterday was something close). Nutty. Singapore suspended the reciprocal green lane (RGL) agreement with Malaysia for the next 3 months.

I’ve managed to finish Bridgerton. I like the score. Finding something on Netflix is proving to be more difficult, regardless of having a VPN. Honestly, this is why Cable TV wins… linear programming that you’re just fed.

Stock market wise, I’ve been following the GameStop short squeeze, and even funnier is the Top Glove one, that they’re trying to repeat in Malaysia. Bitcoin seems to be doing “reasonably well” and I have to say, I think people are starting to realise decentralised services have a future. How do we get there?

What an interesting week, I look forward to more productive time. I’m still writing in my Hobonichi Techo, so at least that’s where most personal stuff ends up, I guess?

,

Jan SchmidtHitting a milestone – Beat Saber!

I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!

I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy 🙂

Check it out:

I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.

I have a list of more things to work on. See you at the next update!

,

Colin CharlesLife with Rona 2.0 – Day 3

What an unplanned day. I woke up in time to do an MD workout, despite feeling a little sore. So maybe I was about 10 minutes late and I missed the first set, but his workouts are so long, and I think there were seven sets anyway. Had a good brunch shortly thereafter.

Did a bit of reading, and then I decided to do a beach boardwalk walk… turns out they were policing the place, and you can’t hit the boardwalk. But the beach is fair game? So I went back to the hotel, dropped off my slippers, and went for a beach jog. Pretty nutty.

Came back to read a little more and figured I might as well do another MD workout. Then I headed out for dinner, trying out a new place — Mint Pizza. Opened 20.12.2020, and they’re empty, and their pizza is actually pretty good. Lamb and BBQ chicken, they did half-and-half.

Twitter was discussing Raspberry Pi’s, and all I could see is a lot of misinformation, which is truly shocking. The irony is that open source has been running the Internet for so long, and progressive web apps have come such a long way…

Back in the day when I did OpenOffice.org or Linux training even, we always did say you should learn concepts and not tools. From the time we ran Linux installfests in the late-90s in Sunway Pyramid (back then, yes, Linux was hard, and you had winmodems), but I had forgotten that I even did stuff for school teachers and NGOs back in 2002… I won’t forget PC Gemilang either…

Anyway, I placed an order again for another Raspberry Pi 400. I am certain that most people talk so much crap, without realising that Malaysia isn’t a developed nation and most people can’t afford a Mac let alone a PC. Laptops aren’t cheap. And there are so many other issues…. Saying Windows is still required in 2021 is the nuttiest thing I’ve heard in a long time. Easy to tweet, much harder to think about TCO, and realise where in the journey Malaysia is.

Maybe the best thing was that Malaysian Twitter learned about technology. I doubt many realised the difference between a Pi board vs the 400, but hey, the fact that they talked about tech is still a win (misinformed, but a win).

,

Colin CharlesLife with Rona 2.0 – Days 1 & 2

Today is the first day that in the state of Pahang, we have to encounter what many Malaysians are referring to as the Movement Control Order 2.0 (MCO 2.0). I think everyone finally agrees with the terminology that this is a lockdown now, because I remember back in the day when I was calling it that, I’d definitely offend a handful of journalists.

This is one interesting change for me compared to when I last wrote Life with RonaDay 56 of being indoors and not even leaving my household, in Kuala Lumpur. I am now not in the state, I am living in a hotel, and I am obviously moving around a little more since we have access to the beach.

KL/Selangor and several other states have already been under the MCO 2.0 since January 13 2021, and while it was supposed to end on January 26, it seems like they’ve extended and harmonised the dates for Peninsular Malaysia to end on February 4 2021. I guess everyone got the “good news” yesterday. The Prime Minister announced some kind of aid last week, but it is still mostly a joke.

Today was the 2nd day I woke up at around 2.30pm because I went to bed at around 8am. First day I had a 23.5 hour uptime, and the today was less brutal, but working from 1-8am with the PST timezone is pretty brutal. Consequently, I barely got too much done, and had one meal, vegetarian, two packs that included rice. I did get to walk by the beach (between Teluk Cempedak and Teluk Cempedak 2), did quite a bit of exercise there and I think even the monkeys are getting hungry… lots of stray cats and monkeys. Starbucks closes at 7pm, and I rocked up at 7.10pm (this was just like yesterday, when I arrived at 9.55pm and was told they wouldn’t grant me a coffee!).

While writing this entry, I did manage to get into a long video call with some friends and I guess it was good catching up with people in various states. It also is what prevented me from publishing this entry!

Day 2

I did wake up reasonable early today because I had pre-ordered room service to arrive at 9am. There is a fixed menu at the hotel for various cuisines (RM48/pax, thankfully gratis for me) and I told them I prefer not having to waste, so just give me what I want which is off menu items anyway. Roti telur double telur (yes, I know it is a roti jantan) with some banjir dhal and sambal and a bit of fruit on the side with two teh tariks. They delivered as requested. I did forget to ask for a jar of honey but that is OK, there is always tomorrow.

I spent most of the day vacillating, and wouldn’t consider it productive by any measure. Just chit chats and napping. It did rain today after a long time, so the day seemed fairly dreary.

When I finally did awaken from my nap, I went for a run on the beach. I did it barefoot. I have no idea if this is how it is supposed to be done, or if you are to run nearer the water or further up above, but I did move around between the two quite often. The beach is still pretty dead, but it is expected since no one is allowed to go unless you’re a hotel guest.

The hotel has closed 3/4 of their villages (blocks) and moved everyone to the village I’m staying in (for long stay guests…). I’m thankful I have a pretty large suite, it is a little over 980sqft, and the ample space, while smaller than my home, is still welcome.

Post beach run, I did a workout with MD via Instagram. It was strength/HIIT based, and I burnt a tonne, because he gave us one of his signature 1.5h classes. It was longer than the 80 minute class he normally charges RM50 for (I still think this is undervaluing his service, but he really does care and does it for the love of seeing his students grow!).

Post-workout I decided to head downtown to find some dinner. Everything at the Teluk Cemepdak block of shops was closed, so they’re not even bothered with doing takeaway. Sg. Lembing steakhouse seemed to have cars parked, Vanggey was empty (Crocodile Rock was open, can’t say if there was a crowd, because the shared parking lot was empty), there was a modest queue at Sate Zul, and further down, Lena was closed, Pickers was open for takeaway but looked pretty closed, Tjantek was open surprisingly, and then I thought I’d give Nusantara a try again, this time for food, but their chef had just gone home at about 8pm. Oops. So I drove to LAN burger, initially ordering just one chicken double special; however they looked like they could use the business so I added on a beef double special. They now accept Boost payments so have joined the e-wallet era. One less place to use cash, which is also why I really like Kuantan. On the drive back, Classic Pilot Cafe was also open and I guess I’ll be heading there too during this lockdown.

Came back to the room to finish both burgers in probably under 15 minutes. While watching the first episode of Bridgerton on Netflix. I’m not sure what really captivates, but I will continue on (I still haven’t finished the first episode). I need to figure out how to use the 2 TVs that I have in this room — HDMI cable? Apple TV? Not normally using a TV, all this is clearly more complex than I care to admit.

I soaked longer than expected, ended up a prune, but I’m sure it will give me good rest!

One thought to leave with:

“Learn to enjoy every minute of your life. Be happy now. Don’t wait for something outside of yourself to make you happy in the future.” — Earl Nightingale

,

Sam WatkinsDeveloping CZ, a dialect of C that looks like Python

In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.

Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.

My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.

The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.

C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).

At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.

CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.

I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.

CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.

One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.

CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.

I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.

For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).

The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.

CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.

I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!

CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.

Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.

#!/usr/local/bin/cz --
use b
use ccomplex

Main:
	num outside = 16, ox = -0.5, oy = 0, r = 1.5
	long i, max_i = 50, rb_i = 30
	space()
	uint32_t *px = pixel()  # CONFIGURE!
	num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
	for(y, 0, h):
		cmplx c = x0 + (y0-d*y)*I
		repeat(w):
			cmplx w = c
			for i=0; i < max_i && cabs(w) < outside; ++i
				w = w*w + c
			*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
			c += d

I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.

An image from the Mandelbrot set, generated by a fairly simple CZ program.

I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.

Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:

+add, subtract, disjoint union, zero
×multiply, divide, cartesian product, one
^power, root, logarithm
sin, cos, sin-1, cos-1, hypot, atan2
δdifferential, integral
a set of minimal relational operators for elementary math

I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.

CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.

Thanks for reading. I wrote this article during the process of applying to join Toptal, which appears to be a freelancing portal for top developers; and in response to this article on toptal: After All These Years, the World is Still Powered by C Programming.

My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!

,

Glen TurnerCompiling and installing software for the uBITX v6 QRP amateur radio transciever

The uBITX uses an Arduino internally. This article describes how to update its software.

Required hardware

The connector on the back is a Mini-B USB connector, so you'll need a "Mini-B to A" USB cable. This is not the same cable as used with older Android smartphones. The Mini-B connector was used with a lot of cameras a decade ago.

You'll also need a computer. I use a laptop with Fedora Linux installed.

Required software for software development

In Fedora all the required software is installed with sudo dnf install arduino git. Add yourself to the users and lock groups with sudo usermod -a -G users,lock $USER (on Debian-style systems use sudo usermod -a -G dialout,lock $USER). You'll need to log out and log in again for that to have an effect (if you want to see which groups you are already in, then use the id command).

Run arduino as your ordinary non-root user to create the directories used by the Arduino IDE. You can quit the IDE once it starts.

Obtain the uBITX software

$ cd ~/Arduino
$ git clone https://github.com/afarhan/ubitxv6.git ubitx_v6.1_code

Connect the uBITX to your computer

Plug in the USB cable and turn on the radio. Running dmesg will show the Arduino appearing as a "USB serial" device:

usb 1-1: new full-speed USB device number 6 using xhci_hcd
usb 1-1: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
usb 1-1: Product: USB Serial
usbcore: registered new interface driver ch341
usbserial: USB Serial support registered for ch341-uart
ch341 1-1:1.0: ch341-uart converter detected
usb 1-1: ch341-uart converter now attached to ttyUSB1

If you want more information about the USB device then use:

$ lsusb -d 1a86:7523
Bus 001 Device 006: ID 1a86:7523 QinHeng Electronics CH340 serial converter


comment count unavailable comments

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

A Quick Recap

When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

Integration in OpenHMD

When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

Headset motion capture trace

If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

Headset motion capture – zoomed in view

This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

Losing Track

The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

That looks more like this

Headset motion capture trace with tracking errors

This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

Latency Compensation

There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

Here’s a good example of the difference:

Before: Position filtering with no latency compensation
After: Latency-compensated position reports

Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

There are some other possible approaches to this problem:

  • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
  • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
  • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
  • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

Controllers

So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

Oculus Camera view of headset and left controller.

This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

Position tracking of left controller with lots of tracking loss.

Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

Timing Improvements

One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

Sponsorships

All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

Next Steps

The next things on my todo list are:

  • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
  • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
  • Publish the UKF branch for other people to try.
  • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
  • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

,

Colin CharlesCiao, 2020

Another year comes to a close, and this is the 4th year running I’m in Kuala Lumpur — 2017, 2018, 2019, and 2020… Wow. Maybe the biggest difference is that I’ve been in Malaysia for 306 days, thanks to the novel coronavirus. I have never spent this much time in Malaysia, in my entire life… I want to say KL, but I’ve managed to zip my way around to Kuantan (a lot), Penang, and Malacca. I can’t believe I flew back on February 29 2020 from Tokyo, and never got on a plane again! What a grounded globalist I’ve become.

My travel stats are of course, pretty dismal. 39 days out of the country. Apparently I did a total of 13 trips, 92 days of travel (I don’t know if all my local trips are counted frankly), 60,766km, 17 cities, and still 7 countries :) I don’t even want to compare to what it was like in 2019.

I ended that by saying, “I welcome 2020 with arms wide open.”. I’m not so sure how I feel about 2020. There is life beyond travel. COVID and our reaction to it, really worries me.

KL has some pretty good food. Kuantan has some pretty good people. While in KL, I visited a spin studio at least once per day. I did a total of 272 spin classes over 366 days! Not to forget there was 56 days of complete lockdown, and studios didn’t open till about maybe mid-June… Sure I did do some spin in London and Paris too, but the bulk of all this happened while I was here in KL.

I became reasonably friendlier, I became vulnerable, and like every time you do that, you’re chances of happiness and getting hurt probably straddle 50:50. Madonna – The Power of Good-bye can be apt.

This is not to say I didn’t enjoy 2020. Glass half full. I really did. Carpe diem. Simplicity is best. If you can follow KISS principles in engineering, why would you pour your entire thought process out and overwhelm the other party?

Anyway, I still look forward to 2021, with wide open arms, and while I really do think the COVID mess isn’t going away and things are going to be worse for many, I will still be focused on the most positive aspects of 2021. And I’ll work on being my old self again ;-)

I also ended the year with a haircut (number 1/0.5 on the sides) on Monday 28 December 2020. Somewhat of an experiment (does CoQ10 help speed up hair growth?) but also somewhat of a reaction to saying goodbye to December 2020.

,

Glen TurnerBlocking a USB device

udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"


comment count unavailable comments

,

Hamish TaylorWattlebird feeding

While I hope to update this site again soon, here’s a photo I captured over the weekend in my back yard. The red flowering plant is attracting wattlebirds and honey-eaters. This wattlebird stayed still long enough for me to take this shot. After a little bit of editing, I think it has turned out rather well.

Photo taken with: Canon 7D Mark II & Canon 55-250mm lens.

Edited in Lightroom and Photoshop (to remove a sun glare spot off the eye).

Wattlebird feeding

,

Glen TurnerConverting MPEG-TS to, well, MPEG

Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


comment count unavailable comments

,

Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

@apply

def apply(transform):

    def __decorator__(using_this):
        return transform(using_this)

    return __decorator__


numbers = [1, 2, 3, 4, 5]

@apply(lambda f: list(map(f, numbers)))
def squares(i):
  return i * i

print(list(squares))

# prints: [1, 4, 9, 16, 25]

Init.java

public class Init {
  public static void main(String[] args) {
    System.out.println("Hello, World!")
  }
}

@switch and @case

__NOT_A_MATCHER__ = object()
__MATCHER_SORT_KEY__ = 0

def switch(cls):

    inst = cls()
    methods = []

    for attr in dir(inst):
        method = getattr(inst, attr)
        matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)

        if matcher == __NOT_A_MATCHER__:
            continue

        methods.append(method)

    methods.sort(key = lambda i: i.__matcher_sort_key__)

    for method in methods:
        matches = method.__matcher__()
        if matches:
            return method()

    raise ValueError(f"No matcher matches value {test_value}")

def case(matcher):

    def __decorator__(f):
        global __MATCHER_SORT_KEY__

        f.__matcher__ = matcher
        f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
        __MATCHER_SORT_KEY__ += 1
        return f

    return __decorator__



if __name__ == "__main__":
    for i in range(100):

        @switch
        class FizzBuzz:

            @case(lambda: i % 15 == 0)
            def fizzbuzz(self):
                return "fizzbuzz"

            @case(lambda: i % 3 == 0)
            def fizz(self):
                return "fizz"

            @case(lambda: i % 5 == 0)
            def buzz(self):
                return "buzz"

            @case(lambda: True)
            def default(self):
                return "-"

        print(f"{i} {FizzBuzz}")

,

Craig SandersFuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

,

Hamish TaylorBlog: A new beginning

Earlier today I launched this site. It is the result of a lot of work over the past few weeks. It began as an idea to publicise some of my photos, and morphed into the site you see now, including a store and blog that I’ve named “Photekgraddft”.

In the weirdly named blog, I want to talk about photography, the stories behind some of my more interesting shots, the gear and software I use, my technology career, my recent ADHD diagnosis and many other things.

This scares me quite a lot. I’ve never really put myself out onto the internet before. If you Google me, you’re not going to find anything much. Google Images has no photos of me. I’ve always liked it that way. Until now.

ADHD’ers are sometimes known for “oversharing”, one of the side-effects of the inability to regulate emotions well. I’ve always been the opposite, hiding, because I knew I was different, but didn’t understand why.

The combination of the COVID-19 pandemic and my recent ADHD diagnosis have given me a different perspective. I now know why I hid. And now I want to engage, and be engaged, in the world.

If I can be a force for positive change, around people’s knowledge and opinion of ADHD, then I will.

If talking about Business Analysis (my day job), and sharing my ideas for optimising organisations helps anyone at all, then I will.

If I can show my photos and brighten someone’s day by allowing them to enjoy a sunset, or a flying bird, then I will.

And if anyone buys any of my photos, then I will be shocked!

So welcome to my little vanity project. I hope it can be something positive, for me, if for noone else in this new, odd world in which we now find ourselves living together.

,

,

,

Jonathan Adamczewskif32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

,

Chris NeugebauerReflecting on 10 years of not having to update WordPress

Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

How to migrate a Debian WordPress site to Jekyll

Should you find the Jekyll exporter not working on your Debian WordPress install:

  1. Use the standard WordPress export to export an XML feel of your site.
  2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
  3. Import the exported XML feed.
  4. Install the Jekyll exporter plugin.
  5. Follow the documentation and receive a Jekyll export of your site.

Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

,

Andrew RuthvenInstall Fedora CoreOS using FAI

I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

A few interesting aspects I encountered:

  1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
  2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
  3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
  4. The time to do an installation was incredibly long.
  5. The initrd image is really just running coreos-installer against the nominated device.

During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

,

Robert CollinsStrength training from home

For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

I’ve gotten most of my information from a few different youtube channels:

There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

Good luck, train well!

Brendan ScottCovid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

,

Ian BrownKubernetes Secrets Security

I was always of the belief that secrets within Kubernetes (k8s) are secure. How wrong I was! After a recent meetup featuring a Google security expert, I discovered that the secrets I have in our k8s cluster are no more secure than writing them down on paper and leaving it on a park bench. Google Cloud Platform is a fantastic offering, and yes, I am biased. I have used every other cloud platform from every major vendor over the last 10 odd years of my career.

,

Clinton Roylca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

,

Robert Collins2019 in the rearview

2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.

We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.

When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.

At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.

The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).

I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.

Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.

Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.

We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).

I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.

I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.

My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.

In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol squats, push ups, dead hangs and so on. My upper body strength needs to come along some before we can really go places though… and we’re probably going to max out the hamstring curl machine (at least for regular two-leg curls) before my core is strong enough to do a Nordic drop.

Lynne has been worried about injuring herself with weight lifting at the gym for some time now, but recently saw my physio – Ben Cameron at Pegasus PhysioSouth – who is excellent, and he suggested that she could have less chronic back pain if she took weights back up again. She’s recently told me that I’m allowed one ‘told you so’ about this, since she found herself in a spot where previously she would have put herself in a poor lifting position, but the weight training gave her a better option and she intuitively used it, avoiding pain. So that’s a good thing – complicated because of her bodies complicated history, but an excellent trainer and physio team are making progress.

Earlier this year she had a hell of a fright, with a regular eye checkup getting referred into a ‘you are going blind; maybe tomorrow, maybe within 10 years’ nightmare scenario. Fortunately a second opinion got a specialist who probably knows the same amount but was willing to communicate it with actual words… Lynne has a condition which diabetes (type I or II) can affect, and she has a vein that can alter state somewhat arbitrarily but will probably only degrade slowly, particularly if Lynne’s diet is managed as she has been doing.

Diet wise, Lynne also has been losing some weight but this is complicated by her chronic idiopathic pancreatitis. That’s code for ‘it keeps happening and we don’t know why’ pancreatitis. We’ve consulted a specialist in the North Island who comes highly recommended by Lynne’s GP, who said that rapid weight loss is a little known but possible cause of pancreatitis – and that fits the timelines involved. So Lynne needs to lose weight to manage the onset of type II diabetes. But not to fast, to avoid pancreatitis, which will hasten the onset of type II diabetes. Aiee. Slow but steady – she’s working with the same doctor I am for that, and a similar diet, though lower on the fats as she has no gall… bladder.

In April our kitchen waste pipe started chronically blocking, and investigation with a drain robot revealed a slump in the pipe. Ground penetrating radar reveal an anomaly under the garage… and this escalated. We’re going to have to move out of the house for a week while half the house’s carpets are lifted, grout is pumped into the foundations to tighten it all back up again – and hopefully they don’t over pump it – and then it all gets replaced. Oh, and it looks like the drive will be replaced again, to fix the slumped pipe permanently. It will be lovely when done but right now we’re facing a wall of disruption and argh.

Around September I think, we managed to have a gas poisoning scare – our gas hob was left on and triggered a fireball which fortunately only scared Lynne rather than flambéing her face. We did however not know how much exposure we’d had to the LPG, nor to partially combusted gas – which produces toxic CO as a by-product, so there was a trip into the hospital for observation with Cynthia, with Lynne opting out. Lynne and Cynthia had had plenty of the basic symptoms – headaches, dizziness and so on at the the time, but after waiting for 2 hours in the ER queue that had faded. Le sigh. The hospital, bless their cotton socks don’t have the necessary equipment to diagnose CO poisoning without a pretty invasive blood test, but still took Cynthia’s vitals using methods (manual observation and a infra-red reader) that are confounded by the carboxyhemoglobin that forms from the CO that has been inhaled. Pretty unimpressed – our GP was livid. (This is one recommended protocol). Oh, and our gas hob when we got checked out – as we were not sure if we had left it on, or it had been misbehaving, turned out to have never been safe, got decertified and the pipe cut at the regulator. So we’re cooking on a portable induction hob for now.

When we moved to Rangiora I was travelling a lot more, Christchurch itself had poorer air quality than Rangiora, and our financial base was a lot smaller. Now, Rangiora’s population has gone up nearly double (13k to 19k conservatively – and that’s ignoring the surrounds that use Rangiora as a base), we have more to work with, the air situation in Christchurch has improved massively, and even a busy years travel is less than I was doing before Cynthia came along. We’re looking at moving – we’re not sure where yet; maybe more country, maybe more city.

One lovely bright spot over the last few years has been reconnecting with friends from school, largely on Facebook – some of whom I had forgotten that I knew back at school – I had a little clique but was not very aware of the wider school population in hindsight (this was more than a little embarrassing to me, as I didn’t want to blurt out “who are you?!”) – and others whom I had not :). Some of these reconnections are just light touch person-X exists and cares somewhat – and that’s cool. One in particular has grown into a deeper friendship than we had back as schoolkids, and I am happy and grateful that that has happened.

Our cats are fat and happy. Well mostly. Baggy is fat and stressed and spraying his displeasure everywhere whenever the stress gets too much :(. Cynthia calls him Mr Widdlepants. The rest of the time he cuddles and purrs and is generally happy with life. Dibbler and Kitten-of-the-wild are relatively fine with everything.

Cynthia’s violin is coming along well. She did a small performance for her classroom (with her teacher) and wowed them. I’ve been inspired to start practising trumpet again. After 27 years of decay my skills are decidedly rusty, but they are coming along. Finding arrangements for violin + trumpet is a bit challenging, and my sight-reading-with-transposition struggles to cope, but we make do. Lynne is muttering about getting a clarinet or drum-kit and joining in.

So, 2019. Whew. I hope yours was less stressful and had as many or more bright points than ours. Onwards to 2020.

,

BlueHackersBlueHackers crowd-funding free psychology services at LCA and other conferences

BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

The post BlueHackers crowd-funding free psychology services at LCA and other conferences first appeared on BlueHackers.org.

,

Robert CollinsA Cachecash retrospective

In June 2019 I started a new role as a software engineer at a startup called Cachecash. Today is probably the last day of payroll there, and as is my usual practice, I’m going to reflect back on my time there. Less commonly, I’m going to do so in public, as we’re about to open the code (yay), and its not a mega-corporation with everything shuttered up (also yay).

Framing

This is intended to be a blameless reflection on what has transpired. Blameless doesn’t mean inaccurate; but it means placing the focus on the process and system, not on the particular actor that happened to be wearing the hat at the time a particular event happened. Sometimes the system is defined by the actors, and in that case – well, I’ll let you draw your own conclusions if you encounter that case.

A retrospective that we can’t learn from is useless. Worse than useless, because it takes time to write and time to read and that time is lost to us forever. So if a thing is a particular way, it is going to get said. Not to be mean, but because false niceness will waste everyone’s time. Mine and my ex-colleagues whose time I respect. And yours, if you are still reading this.

What was Cachecash

Cachecash was a startup – still is in a very technical sense, corporation law being what it is. But it is still a couple of code bases – and a nascent open source project (which will hopefully continue) – built to operationalise and productise this research paper that the Cachecash founders wrote.

What it isn’t anymore is a company investing significant amounts of time and money in the form of engineering in making code, to make those code bases better.

Cachecash was also a team of people. That obviously changed over time, but at the time I write this it is:

  • Ghada
  • Justin
  • Kevin
  • Marcus
  • Petar
  • Robert
  • Scott

And we’re all pretty fantastic, if you ask me :).

Technical overview

The CAPNet paper that I linked above doesn’t describe a product. What it describes is a system that permits paying caches (think squid/varnish etc) for transmitting content to clients, while also detecting attempts by such caches to claim payment when they haven’t transmitted, or attempting to collude with a client to pretend to overtransmit and get paid that way. A classic incentives-aligned scheme.

Note that there is no blockchain involved at this layer.

The blockchain was added into this core system as a way to build a federated marketplace – the idea was that the blockchain provided a suitable substrate for negotiating the purchase and sale of contracts that would be audited using the CAPNet accounting system, the payments could be micropayments back onto the blockchain, and so on – we’d avoid the regular financial system, and we wouldn’t be building a fragile central system that would prevent other companies also participating.

Miners would mine coins, publishers would buy coins then place them in escrow as a promise to pay caches to deliver content to clients, and a client would deliver proof of delivery back to the cache which would then claim payment from the publisher.

Technical Challenges

There were a few things that turned up as significant issues. In no particular order:

The protocol

The protocol itself adds additional round trips to multiple peers – in its ‘normal’ configuration the client ends up running (web- for browers) GRPC connections to 5 endpoints (with all the normal windowing concerns, but potentially over QUIC), and then gets chunks of content in batches (concurrently) from 4 of the peers, runs a small crypto brute force operation on the combined result, and then moves onto the next group of content. This should be sounding suspiciously like TCP – it is basically a window management problem, and it has exactly the same performance management problems – fast start, maximum window size, how far to reduce it when problems are suffered. But accentuated: those 4 cache peers can all suffer their own independent noise problems, or be hostile. But also, they can also suffer correlated problems: they might all be in the same datacentre, or be all run by a hostile actor, or the client might be on a hostile WiFi link, or the client’s OS/browser might be hostile. Lets just say that there is a long, rich road for optimising this new protocol to make it fast, robust, reliable. Much as we have taken many years to make HTTP into QUIC, drawing upon techniques like forward error correction rather than retries – similar techniques will need to be applied to give this protocol similar performance characteristics. And evolving the protocol while maintaining the security properties is a complicated task, with three actors involved, who may collude in various ways.

An early performance analysis I did on the go code implementation showed that the brute forcing work was a bottleneck because while the time (once optimise) per second was entirely modest for any small amount of data, the delay added per window element acts as a brake on performance for high capacity low latency links. For a 1Gbps 25ms RTT link I estimated a need for 8 cores doing crypto brute forcing on the client.

JS

Cachecash is essentially implementing a new network protocol. There are some great hooks these days in browsers, and one can hook in and provide streams to things like video players to let them get one segment of video. However, for downloading an entire file – for instance, if one is downloading a full video, it is not so easy. This bug, open for 2 years now, is the standards based way to do it. Even so non-standards based way to do it involves buffering the entire content in memory, oh and reflecting everything through a static github service worker. (You of course host such a static page yourself, but then the whole idea of this federated distributed system breaks down a little).

Our initial JS implementation was getting under 512KBps with all-local servers – part of that was the bandwidth delay product issue mentioned above. Moving to getting chunks of content from each cache concurrently using futures improved that up to 512KBps, but thats still shocking for a system we want to be able to compete with the likes of Youtube, Cloudflare and Akamai.

One of the hot spots turned out to be calculating SHA-256 values – the CAPNet algorithm calculates thousands (it’s tunable, but 8k in the set I was analysing) of independent SHA’s per chunk of received data. This is a problem – in browser SHA routines, even the recent native hosted ones – are slow per SHA. They are not slow per byte. Most folk want to make a small number of SHA calculations. Maybe thousands in total. Not tens of thousands per MB of data received….. So we wrote an implementation of the core crypto routines in Rust WASM, which took our performance locally up to 2MBps in Firefox and 6MBps in Chromium.

It is also possible we’d show up as crypto-JS at that point and be blacklisted as malware!

Blockchain

Having chosen to involve a block chain in the stack we had to deal with that complexity. We chose to take bitcoin’s good bits and run with those rather than either running a sidechain, trying to fit new transaction types into bitcoin itself, or trying to shoehorn our particular model into e.g. Ethereum. This turned out to be a fairly large amount of work : not the core chain itself – cloning the parts of bitcoin that we wanted was very quick. But then layering on the changes that we needed, to start dealing with escrows and negotiating parameters between components and so forth. And some of the operational challenges below turned up here as well even just in developer test setups (in particular endpoint discovery).

Operational Challenges

The operational model was pretty interesting. The basic idea was that eventually there would be this big distributed system, a bit-coin like set of miners etc, and we’d be one actor in that ecosystem running some subset of the components, but that until then we’d be running:

  • A centralised ledger
  • Centralised random number generation for the micropayment system
  • Centralised deployment and operations for the cache fleet
  • Software update / vetting for the publisher fleet
  • Software update / publishing for the JS library
  • Some number of seed caches
  • Demo publishers to show things worked
  • Metrics, traces, chain explorer, centralised logging

We had most of this live and running in some fashion for most of the time I was there – we evolved it and improved it a number of times as we iterated on things. Where appropriate we chose open source components like Jaeger, Prometheus and Elasticsearch. We also added policy layers on top of them to provide rate limiting and anti-spoofing facilities. We deployed stuff in AWS, with EKS, and there were glitches and things to workaround but generally only a tiny amount of time went into that part of it. I think I spent a day on actual operations a month, or thereabouts.

Other parties were then expected to bring along additional caches to expand the network, additional publishers to expand the content accessible via the network, and clients to use the network.

Ensuring a process run by a third party is network reachable by a browser over HTTPS is a surprisingly non-simple problem. We partly simplified it by mandating that they run a docker container that we supplied, but there’s still the chance that they are running behind a firewall with asymmetric ingress. And after that we still need a domain name for their endpoint. You can give every cache a CNAME in a dedicated subdomain – say using their public key as the subdomain, so that only that cache can issue requests to update their endpoint information in DNS. It is all solvable, but doing it so that the amount of customer interaction and handholding is reduced to the bare minimum is important: a user with a fleet of 1000 machines doesn’t want to talk to us 1000 times, and we don’t want to talk to them either. But this was a another bit of this-isn’t-really-distributed-is-it grit in the distributed-ointment.

Adoption Challenges

ISPs with large fleets of machines are in principle happy to sell capacity on them in return for money – yay. But we have no revenue stream at the moment, so they aren’t really incentivised to put effort in, it becomes a matter of principle, not a fiscal “this is 10x better for my business” imperative. And right now, its 10x slower than HTTP. Or more.

Content owners with large amounts of content being delivered without a CDN would like a radically cheaper CDN. Except – we’re not actually radically cheaper on a cost structure basis. Current CDN’s are expensive for their expensive 2nd and third generation products because no-one offers what they offer – seamless in-request edge computing. But that ISP that is contributing a cache to the fleet is going to want the cache paid for, and thats the same cost structure as existing CDNs – who often have a free entry tier. We might have been able to make our network cheaper eventually, but I’m just not sure about the radically cheaper bit.

Content owners who would like a CDN marketplace where the CDN caches are competing with each other – driving costs down – rather than than the CDN operators competing – would absolutely love us. But I rather suspect that those owners want more sophisticated offerings. To be clear, I wasn’t on the customer development team, and didn’t get much in the way of customer development briefings. But things like edge computing workers, where completely custom code can run in the CDN network, adjacent to ones user, are much more powerful offerings than simple static content shipping offerings, and offered by all major CDN’s. These are trusted services – the CAPNet paper doesn’t solve the problem of running edge code and providing proof that it was run. Enarx might go some, or even a long way way to running such code in an untrusted context, but providing a proof that it was run – so that running it can become a mining or mining-like operation is a whole other question. Without such an answer, an edge computing network starts to depend on trusting the caches behaviour a lot more all over again – the network has no proof of execution to depend on.

Rapid adjustment – load spikes – is another possible use case, but the use of the blockchain to negotiate escrows actually seemed to work against our ability to offer that. Akami define load spike in a time frame faster than many block chains can decide that a transaction has actually been accepted. Offchain transactions are of course a known thing in the block chain space but again that becomes additional engineering.

Our use of a new network protocol – for all that it was layered on standard web technology – made it harder for potential content owners to adopt our technology. Rather than “we have 200 local proxies that will deliver content to your users, just generate a url of the form X.Y.Z”, our solution is “we do not trust the 200 local proxies that we have, so you need to run complicated JS in your browser/phone app etc” to verify that the proxies are actually doing their job. This is better in some ways – precisely because we don’t trust those proxies, but it also increases both the runtime cost of using the service, the integration cost adopting the service, and complexity of debugging issues receiving content via the service.

What did we learn?

It is said that “A startup is an organization formed to search for a repeatable and scalable business model.” What did we uncover in our search? What can we take away going forward?

In principle we have a classic two sided market – people with excess capacity close to users want to sell it, and people with excess demand for their content want to buy delivery capacity.

The baseline market is saturated. The market as a whole is on its third or perhaps fourth (depending on how you define things) major iteration of functionality.

Content delivery purchasers are ok with trusting their suppliers : any supply chain fraud happening in this space at the moment is so small no-one is talking about it that I heard about.

Some of the things we were doing don’t seem to have been important to the customers we talked to – I don’t have a great read on this, but in particular, the blockchain aspect seems to have been more important to our long term vision than to the 2-sided market place that we perceived. It would be fascinating to me to validate that somehow – would cache capacity suppliers be willing to trust us enough to sell capacity to us with just the auditing mechanism, without the blockchain? Would content providers be happy buying credit from us rather than from a neutral exchange?

What did I learn?

I think in hindsight my startup muscles were atrophied – it had been some years since Canonical and it took a few months to start really thinking lean-startup again on a personal basis. That’s ok, because I was hired to build systems. But its not great, because I can do better. So number one: think lean-startup and really step up to help with learning and validation.

I levelled up my Go lang skills. That was really nice – Kevin has deep knowledge there, and though I’ve written Go before I didn’t have a good appreciation for style or aesthetics, or why. I do now. Where before I’d say ‘I’m happy to dive in but its not a language I feel I really know’, I am now happy to say that I know Go. More to learn – there always is – but in a good place.

I did a similar thing to my JS skills, but not to the same degree. Having climbed fairly deeply into the JS client – which is written in Typescript, converted its bundling system to webpack to work better with Rust-WASM, and so on. Its still not my go-to place, but I’m much more comfortable there now.

And of course playing with Rust-WASM was pure delight. Markus and I are both Rust afficionados, and having a genuine reason to write some Rust code for work was just delightful. Finding this bug was just a bonus :).

It was also really really nice being back in a truely individual contributor role for a while. I really enjoyed being able to just fix bugs and get on with things while I got my bearings. I’ve ended up doing a bit more leadership – refining of requirements, translating between idea-and-specification and the like recently, but still about 80% of time has been able to be sit-down-and-code, and that really is a pleasant holiday.

What am I going to change?

I’m certainly going to get a new job :). If you’re hiring, hit me up. (If you don’t have my details already, linkedin is probably best).

I’m think there the core thing I need to do is more alignment of the day to day work I’m doing with needs of customer development : I don’t want to take on or take over the customer development role – that will often be done best in person with a customer for startups, and I’m happy remote – but the more I can connect what I’m trying to achieve with what will get the customers to pay us, the more successful any business I’m working in will be. This may be a case for non-vanity metrics, or talking more with the customer-development team, or – well, I don’t know exactly what it will look like until I see the context I end up in, but I think more connection will be important.

And I think the second major thing is to find a better balance between individual contribution and leadership. I love individual contribution, it is perhaps the least stressful and most Zen place to be. But it is also the least effective unless the project has exactly one team member. My most impactful and successful roles have been leadership roles, but the pure leadership role with no individual contribution slowly killed me inside. Pure individual contribution has been like I imagine crack to be, and perhaps just as toxic in the long term.

,

Robert CollinsRust and distributions

Daniel wrote a lovely blog post about Rust’s ability to be included in distributions, both as a language that you can get via the distribution, and as the language that components of the distribution are being written in.

I think this is a great goal to raise and I have just a few thoughts and quibbles. First I want to acknowledge and agree with him on the Rust community, its so very nice, and he is doing a great thing as rustup lead; I wish I had more time to put in, I have more things I want to contribute to rustup. I’ll try to get back to the meetings soon.

On trust

I completely agree about the need for the crates index improvement : without those we cannot have a mirror network, and thats a significant issue for offline users and slow-region users.

On curlsh though

It isn’t the worst possible thing, for all that its “untrusted bootstrapping”, the actual thing downloaded is https secured etc, and so is the rustup binary itself. Put another way, I think the horror is more perceptual than analyzed risk. Someone that trusts Verisign etc enough to download the Debian installer enough over it, has exactly the same risk as someone trusting Verisign enough to download rustup at that point in time.

Cross signing curlsh that with per-distro keys or something seems pretty ridiculous to me, since the root of trust is still that first download; unless you’re wandering up to someone who has bootstrapped their compiler by hand (to avoid reflections-on-trust attacks), to get an installer, to build a system, to then do reproducible builds, to check that other systems are actually safe… aieeee.

I think its easier to package the curl|sh shell script in Debian itself perhaps? apt install get-rustup; then if / when rustup becomes packaged the user instructions don’t change but the root of trust would, as get-rustup would be updated to not download rustup, but to trigger a different package install, and so forth.

I don’t think its desirable though, to have distribution forks of the contents that rustup manages – Debian+Redhat+Suse+… builds of nightly rust with all the things failing or not, and so on – I don’t see who that would help. And if we don’t have that then the root of trust would still not be shifted under the GPG keychain – it would still be the HTTPS infrastructure for downloading rust toolchains + the integrity of the rustup toolchain builds themselves. Making rustup, which currently shares that trust, have a different trust root, seems pointless.

On duplication of dependencies

I think Debian needs to become more inclusive here, not Rustup. Debian has spent; pauses, counts, yes, DECADES, rejecting multiple entire ecosystems because of a prejuidiced view about what the Right Way to manage dependencies is. And they are not right in a universal sense. They were right in an engineering sense: given constraints (builds are expensive, bandwidth is expensive, disk is expensive), they are right. But those are not universal constraints, and seeking to impose those constraints on Java and Node – its been an unmitigated disaster. It hasn’t made those upstreams better, or more secure, or systematically fixed problems for users. I have another post on this so rather than repeating I’m going to stop here :).

I think Rust has – like those languages – made the crucial, maintainer and engineering efficiency important choice to embrace enabling incremental change across libraries, with the consequence that dependencies don’t shift atomically, and sure, this is basically incompatible with Debian packaging world view which says that point and patch releases of libraries are not distinct packages, and thus the shared libs for these things all coexist in the same file on disk. Boom! Crash!

I assert that it is entirely possible to come up with a reasonable design for managing a respository of software that doesn’t make this conflation, would allow actual point and patch releases of exist as they are for the languages that have this characteristic, and be amenable to automation, auditing and reporting for security issues. E.g. Modernise Debian to cope with this fundamentally different language design decision… which would make Java and Node and Rust work so very much better.

Alternatively, if Debian doesn’t want to make it possible to natively support languages that have made this choice, Debian could:

  • ship static-but-for-system-libs builds
  • not include things written in rust
  • ask things written in rust to converge their dependencies again and again and again (and only update them when the transitive dependencies across the entire distro have converged)

I have a horrible suspicion about which Debian will choose to do :(. The blinkers / echo chamber are so very strong in that community.

For Windows

We got to parity with Linux for IO for non-McAfee users, but I guess there are a lot of them out there; we probably need to keep pushing on tweaking it until it work better for them too; perhaps autodetect McAfee and switch to minimal? I agree that making Windows users – like I am these days – feel tier one, would be nice :). Maybe a survey of user experience would be a good starting point.

Shared libraries

Perhaps generating versioned symbols automatically and building many versions of the crate and then munging them together? But I’d also like to point here again that the whole focus on shared libraries is a bit of a distribution blind spot, and looking at the vast amount of distribution of software occuring in app stores and their model, suggests different ways of dealing with these things. See also the fairly specific suggestion I make about the packaging system in Debian that is the root of the problem in my entirely humble view.

Bonus

John Goerzen posted an entirely different thing recently, but in it he discusses programs that don’t properly honour terminfo. Sadly I happen to know that large chunks of the Rust ecosystem assume that everything is ANSI these days, and it certainly sounds like, at least for John, that isn’t true. So thats another way in which Rust could be more inclusive – use these things that have been built, rather than being modern and new age and reinventing the 95% match.

,

Simon HormsUpstream Linux Kernel Development

It's All About the Long Term

by Simon Horman

Introduction

My thoughts on the importance of upstream Linux kernel development. An exploration of the motivation for organisations to invest in development for the upstream Linux kernel.

  • Why develop for Linux and why for upstream?
  • Upstreaming models: Upstream first or upstream last
  • Differentiating products and technology
  • Standardisation through collaboration

Why Linux?

Before discussing the merits of working on upstreaming code to the upstream Linux kernel it is worth examining the motivation for developing for Linux at all. There are surely many reasons, lets dig into a few.

Stand on the shoulders of giants

Modern computers are complex high-performance machines. The development effort required to fully utilise a system grows with complexity of its hardware. I would argue that we are long past the point where it is impractical to create general purpose kernels to utilise modern hardware. Rather, a far more practical approach is to build on the work of others. And Linux is a premier platform for such work.

Linux is pervasive

Linux has become pervasive in various parts of industry including cloud computing and enterprise ITC. In order to access such markets solutions are often required to work on Linux. This includes facilities, such as hardware support, provided by the kernel. To meet such customer needs solutions need to be developed for Linux.

Attract developer talent

By working on the Linux kernel, which is an Open Source project, developers are able develop portable skills. Each contribution is peer reviewed and accepted on its merit. And each contribution acts as part of an evolving online CV for the developer. This is clearly and attractive mechanism for developers to display their skills and thus Linux kernel development attracts talent.

Why Upstream?

When working on the Linux kernel the resulting code may be kept out-of-tree or submitted for inclusion in upstream. An attraction of working out-of-tree is that it has a somewhat lower barrier to entry. The process of upstreaming can be time consuming and approaches to problems that work out-of-tree may not be accepted into upstream for a variety of reasons. So there can be a strong temptation to keep code out-of-tree. However, I argue against this approach.

Working with the upstream Linux kernel is, in my opinion, all about planning for the long term. It involves up-front work to enhance the kernel in conjunction with the upstream development community. And doing so is a step towards long term maintainability of code and a sustainable development model.

Technical Debt

If code is worked on out-of-tree for a sustained period of time then the volume and complexity of the code is bound to grow. And each time a part of the upstream kernel on which this code is dependent on changes the out-of-tree code needs to either be refactored or risk obsolescence. The more time passes the more code there is to refactor, as there is more code the chance of a dependency changing increases, and as the distance to the original kernel version increases the complexity of such refactoring is likely to also increase. In short, technical debt mounts over time requiring ever increasing effort to maintain.

Focus

Another benefit of working with upstream is that in large organisations it can provide focus. Rather than different teams developing different solutions to similar problems for each customer for each hardware generation a single solution can be developed upstream. As customer needs change over time they can be addressed via incremental changes upstream rather than perpetuating an explosion of out-of-tree code. And if the focus is on upstream then discussion around which in-house solution to reuse becomes moot.

Avoiding Fragmentation

Working with upstream also helps to reduce fragmentation in solutions. Users want a consistent experience when configuring hardware from different vendors, running kernels from different distributions and so on. Adding special sauce at this layer only diminishes the user experience in the long term.

Upstreaming Models

When working on the Debian kernel team, many years ago now, I instigated a policy of only accepting changes from the upstream kernel. Prior to this policy Debian had often seen itself as acting as a testing ground for upstream-bound, or in some cases not upstream-bound, features. A key problem with this approach was that once a feature became available in the Debian kernel it was bound to be used. And once it became used it had to be maintained. And if it wasn't upstream then that maintenance would typically fall to the Debian Kernel team whose bandwidth was already consumed packaging the upstream kernel.

Upstream first is the idea that code is developed for the upstream kernel. That it is in the upstream kernel that it is made available. And that consumers of the code do so by consuming the upstream kernel. It seeks to place the focus of kernel development where I believe it belongs, in upstream.

The converse of upstream fist is upstream last. In this model code is developed out of tree. And when the time is right it is contributed to upstream. A key attraction of this approach is that it can, in the short term at least, lead to higher velocity of feature development. It may also provide a way to develop ideas that do not seem appropriate for upstream yet. However, it leads to a number of problems.

For one, code that is not developed for upstream is, in my experience, often not suitable for inclusion in upstream. So in this model a typical upstreaming effort involves refactoring or more often than not rewriting the code with the out-of-tree version acting as a reference implementation. Clearly duplicated effort. And there are the problems outlined previously with maintaining code-out-of tree. So while upstream-last can be useful it does come at some cost.

Differentiating Products and Technology

An important distinction that can bee made working on the Linux kernel is that between product and technology. On the one hand technology can be raw, incomplete and often only of use as part of a larger whole. On the other hand products are polished, ideally complete, systems that can readily by utilised by users.

When developing kernel code innovation is occurring, technology is being developed. By participating in upstream development this distinction becomes clearer. The collaboration, often between competitors, in upstream kernel development leads to the technology that can be sustained for the long term. Meeting shorter term customer needs by delivering products becomes a distinctly different activity.

Standardisation Through Collaboration

The process of standardisation takes many guises. One obvious form is through standards bodies. In this model the standards body formulates a standard and possibly a reference implementation and then it is up to adopters to implement the standard. The Linux kernel implements many such standards but it also serves as a mechanism for a very different form of standardisation: the standard emerges from the implementation. Given the pervasive nature of Linux its implementation of a feature can be come a standard. Thus by collaborating on upstream kernel development one effectively participates in a standardisation process.

Conclusion

This discussion has covered some motivations for developing for the upstream kernel and the upstream first and last models for upstream development. If I could stress one point is is that upstream development is all about building a string base for long term maintainability. A solid foundation on which to build products that address customer need.

Credits

The ideas presented above reflect those developed in collaboration with other team members and to a greater or lesser extent put into practice while working with those teams in the past and present. These teams include:

  • Netronome
  • Hisao Munakata, Magnus Damm, Paul Mundt and others at Renesas Electronics
  • Debian Kernel Team