Planet Linux Australia

,

David RoweSubset Vector Quantiser

I’ve returned to speech coding after a year of playing with VHF and HF modems. This is somewhat daunting for me, as speech coding is R&D, which tends to be very open ended; it’s possible to work for months with no clear outcomes. In contrast the modem work is straight forward engineering, and I get the positive feedback of “having stuff work” on a regular basis.

So I’m trying to time box the speech coding projects to a few days work each. This is quite a personal challenge, as there are just so many variables and paths to follow. It’s so easy to go off on a tangent and watch the months pass!

In this particular project I’m looking at the Codec 2 700C Vector Quantiser (VQ), and exploring ways to make it more inherently robust to high pass and low pass filtering at the edges of the spectrum. The broad goal is to improve speech quality at a given bit rate, or support a lower bit rate at the same speech quality. I’m targeting bit rates in the 600 bit/s range, and the lower end of the quality range (communications quality speech).

Subset Vector Quantiser

It’s well known that the most important speech information is between 300 and 3000 Hz. Energy outside that range makes the speech sounds nicer, but doesn’t help intelligibility much. Analog modes such as SSB exploit this by band limiting the speech so that the transmitter power is used just to punch through information in the narrow bandwidth that matters.

With digital speech the RF bandwidth is not directly linked to the bandwidth of the decoded speech. For example FreeDV 700D uses around 1100 Hz of RF bandwidth, but the decoded speech has energy covering most of the 0 to 4000 Hz range.

Codec 2 encodes and transmits the speech spectrum on a regular basis. As well as encoding parts of the spectrum necessary for intelligibility, it also has to encode other features, such as the high pass and low pass roll off the microphone and analog filters in the sound card. These features don’t carry any intelligibility, but bits are consumed to encode them. When you have just 28 bits/frame – every bit matters!

Turns out that some of the wilder variations in the speech spectrum from different sources of speech are in the 0-300Hz and 3000-4000Hz regions, for example different high pass or low pass filtering for a particular microphone or sound card. This can upset the Codec 2 quantiser, e.g. it might expend bits modelling a particular low pass filter response – lowering the quality of the perceptually important 300-3000 Hz information.

So I’ve prototyped a Vector Quantiser (VQ) that just uses the information in the 300-3000 Hz range to quantise the speech spectrum. However the VQ is trained on the full range, so it uses that full range to synthesise the speech, hopefully recovering some of the extended spectrum. There is also a limiter before the VQ to reduce the dynamic range of the frames.

Here are some samples processed with the newamp1 two stage VQ, and the new subset VQ. Like the newamp1 algorithm, it works on vectors of K=20 mel-spaced magnitude samples. The speech codec is only partially quantised (10ms frames, original phase, unquantised pitch), so it would sound worse in a real world fully quantised codec. The “newamp1” algorithm is used for Codec 2 700C [1] (and employed in FreeDV 700C/D/E), and uses 22 bit/frame (550 bits/s at a 40ms frame rate). The subset VQ uses just 12 bits frame (300 bits/s at a 40ms frame rate).

Filename newamp1 subset
big dog Listen Listen
cap Listen Listen
fish Listen Listen
hts2a Listen Listen

There are some samples where newamp1 sounds louder – this could be due to the gain limiter stage in the subset algorithm constraining the dynamic range. There also seems to be more high frequency response with newamp1, indicating subset is not recovering the high frequency speech energy as I had hoped. Both are quite intelligible, and acceptable for communications quality speech.

This table presents the mean square spectral distortion in dB*dB:

Filename newamp1 subset
big dog 6.05 8.56
cap 9.03 8.10
fish 10.81 7.31
hts2a 10.51 8.62

Both the samples and objective results show the subset VQ is holding up OK next to the reference newamp1, despite the low bit rate. I’ve found that around 9 dB*dB spectral distortion gives acceptable results for my (low communications quality) use case.

I tried adding an artificial low pass filter above 3400 Hz to a couple of the input samples, to simulate what might happen from different microphones.

It seems to work OK with the low pass filtered input samples, they sound pretty similar to the sample using the original source (one of the goals of this work):

Filename subset subset low pass
cap Listen Listen
fish Listen Listen

Conclusions and Further work

I’m surprised that a single stage VQ is working quite well at just 12 bits/frame. It’s also encoding both the average frame energy and the spectral shape (I use a separate scalar quantiser for frame energy in newamp1). This is quite optimal from a VQ theory point of view, but sometimes not possible due to practical concerns such as the storage/CPU requirements for a single stage VQ.

Further work ideas:

  1. Try a few more samples, and push this work through to a fully quantised speech codec.
  2. Seeing it’s doing well with a single stage, it would interesting to see if it sounds better with multiple stages.
  3. A single stage VQ enables other tricks, like non-FEC techniques to make the VQ robust to bit errors, such as sorting the VQ indexes [2] or Neural Net style training with bit errors.
  4. Try different companding curves instead of the hard limiter, to better represent louder signals.

Links

[1] Codec 2 700C
[2] Codec 2 at 450 bits/s. Another single VQ Codec 2 mode, that can generate high frequency information using a similar approach to this post.
[3] Codec 2 700C Equaliser Part 2. A previous approach to handle a similar problem, in this case the speech is equalised before hitting a full band VQ.
[4] The script train_sub_quant.sh in This GitHub PR” is used to perform the experiments documented in this post.

Pia AndrewsRenewal of the public service is everyone’s job

This article originally appeared in the Public Sector journal July 2021. It is reproduced here with the permission of IPANZ (Institute of Public Administration New Zealand).

Pia Andrews sees wonderful opportunities in the new Public Service Act.

As someone who is passionate about public sector renewal and transformation, I was fascinated to read the new Public Service Act. I believe it provides a powerful lever for systemic change, but a lever is only effective when you use it.

I’ve heard different opinions about the Act across Aotearoa. Some see it as the start of getting the sector back to being a vehicle for public good. Some view it with scepticism. To my mind, both of these perspectives are valid, because the Act provides both a light on the hill, and a stark counter-factual to the daily experience of many. 

At the end of the day, the Act is what you make of it, what we all make of it. If you are one of the many people “waiting for it to be implemented”, then I suggest you are, at best, missing an opportunity. I urge you to do everything in your power to take this moment and make it count. The lever is there, let’s use it. 

This article examines key aspects of the Act, then provides some practical examples of how it could be applied. 

Unpacking the new Public Service Act (2020)

The Purpose of the Act is bold because it recognises the need for change and modernisation, for establishing a shared purpose, and for better ways of working that are more collaborative, transparent, and outcomes focused. Most importantly, the Act affirms that the fundamental characteristic of the public service is to “act with a spirit of service to the community”.

But we need to look back in time to understand why such a legislative intervention was needed.

From the mid-1980s, there was a global shift in public sectors away from service to the public and towards managerialism and artificial “business” imperatives. This pseudo-business approach, combined with subsequent generations of leadership who increasingly see themselves more as executives than stewards of public good, has resulted in some institutions losing their core purpose, losing their stewardship culture, and losing touch with the public they serve. 

Short-termism has become rewarded at the cost of long-term planning and delivery of public value. The Act’s purpose supports a balanced public service approach that actively holds the role of caretaker of the long term public interest, while supporting democratic, constitutional and successive governments, and actively engaging citizens. It is with this balance that resources, no matter how stretched, could be purposefully and proportionally distributed to both urgent short term priorities as well as strategic long term ones. 

Principles and values

The principles and values reinforce a way of working that is politically neutral, free and frank, merit based, and focused on long-term planning, with a commitment to open government and stewardship. The values reinforce a way of behaving that is impartial, accountable, trustworthy, respectful, and responsive to the people of New Zealand. These provide a powerful call to action for all public servants and a systemic framework for a culture that rewards high integrity and brave pursuit of values-based public good for Aotearoa.

Crown’s relationship with Māori

Public service leaders are “responsible for developing and maintaining the capability of the public service to engage with Māori and to understand Māori perspectives.” This requires us all to seek to understand different ways of thinking, living and seeing the world, including our own worldview. It presents an opportunity to shift towards an open-minded and respectful partnership framework with ngā Iwi and Māori. You can start small by simply taking the time to understand the whakapapa, stories, values and whanaungatanga of the people you work with and serve. A greater engagement of and understanding of Mātauranga Māori could help us all live up to the intent, sovereignty, and community empowerment outlined in Te Tiriti o Waitangi.

Integrity and conduct

The Act states “… public service employees have all the rights and freedoms affirmed in the New Zealand Bill of Rights Act 1990”. To have public servants acknowledged as actual people with rights and responsibilities like any other citizen might seem an unnecessary truism, but this simple acknowledgment promotes a public service that trusts in and empowers all public servants to be an active part of civil society. It also improves the confidence of public servants to engage effectively and openly with others (citizens, ngā Iwi, other sectors, experts, community forums). By contrast, there are many jurisdictions around the world that inhibit, intimidate, or simply prohibit public servants from having any role in civil society.

Joint operational agreements

The Act supports the establishment of “joint operational agreements”, which always need to be hosted by an existing department. The challenge is new initiatives may be constrained by the host departments, so when establishing these new agreements, carefully consider how to structure and empower them to do something differently or better, otherwise they will simply become carbon copies of their hosts.

Applying the Act to two problem areas

Problem #1: Maintaining trust in the public service 

Trust is hard won and easily lost. Public institutions globally are struggling to shift from simply seeking trust, to being more trustworthy. Trust in the public service is certainly impacted by real or perceived issues with services, policies or decision making, and reduced confidence can lead to people not trusting, engaging with, or respecting the policies or democratic outcomes administered by the public sector. 

Ways to apply the Act: 

  • Engage the public to inclusively co-design “trustworthy” policies, programmes, services and infrastructure that reflect societal values and needs. 
  • Explore what would be needed to support transparent, appealable, and auditable decisions and services that are traceable back to law, such as publicly available and testable legislation as code. 
  • Be operationally transparent – publish your operating procedures, governance, oversight mechanisms reports, etc. Make it easy for the public to find, learn about, keep up to date with, and contribute to your programme, and publicly track your progress and impact. 
  • Ensure that reports to the Public Service Commissioner on progress towards goals are publicly available.
  • Embed, measure, and monitor the various accountabilities and performance requirements outlined in the Act.
  • Take a human and whānau-centred approach, not just a user-centric approach. 
  • Apply the Wellbeing Framework into policy proposals, funding proposals, measurement and performance frameworks, and independent baselining. 

Problem #2: Focusing on long-term planning and policy futures  

The public service has become largely reactive. While there are exceptions – like the Department of Conservation 50-year goals and planning – most agencies are largely driven by the latest urgency, budget or electoral cycle. But how can you take the right next step if you don’t know where you are going? 

Ways to apply the Act:

  • Dedicate a percentage (I suggest 15%) of your programme resource to community engagement efforts, long-term planning, and staff innovation.
  • Establish a joint “policy futures” operational agreement between all policy units to co-resource policy proposals and optimistic futures for Aotearoa that draw on research, emerging trends, public values and changing needs.
  • Assume and monitor for continuous change in everything you do. Be operationally proactive.
  • Consider how your department could better support community initiatives. Provide public infrastructure (including digital) for others to build on.

Conclusion and what next?

The Act reminds public servants of our responsibilities to communities, but it also recognises us as independent  individuals with a democratic right to be engaged in civil society. It promotes a more adaptive, confident and collaborative public service and includes stronger recognition of the role of the public service to support partnership between Māori and the Crown. 

At all levels, individuals, teams, divisions and departments should take time to review and apply the Act to all programs, structures, services, policies and ways of working, to nudge the entire machinery of government towards a greater “spirit of service to the community”. Here are some more things you could try today:

  • Be proud and have a voice! Join user groups, meet-ups, blogs, and communities of practice, and explore what “good” could look like for Aotearoa. Actively resist cynicism or complacency. 
  • Close the policy-implementation divide through multi-disciplinary design and delivery of policies and services.
  • Raise thoughtful opportunities to apply the Act with your team leads, managers, executives, and leaders and at conferences and hui.
  • Question the status quo and take time to learn about the history of the public sector. 
  • Explore what it means to be participatory, trustworthy, and equitable in the 21st century.
  • Actively watch for and resist the use of technology, structure, hierarchy, or policy to dehumanise or disempower the people and communities you serve.

This path will not be easy, and some people will be systemically or personally motivated to maintain the status quo. But there are many more public servants who want to create change for the better, so use forums like GOVIS and IPANZ to support each other. Being the kindest and calmest person in the room will often help, but sometimes this will also mean not staying in an environment that doesn’t support the sort of public service you believe in. 

For all those who work so hard every day just to keep the lights on, may I suggest it is perhaps time to stop allowing yourself to be someone else’s crutch? Only then can we start to collectively repair and renew where things are broken.`

I’ll finish by saying the people of this wonderful country are relying on you. So please be brave, be bold, and together we have a chance of creating a more participatory, trustworthy, and humane public service, powered by the spirit of service that lies inside every public servant. 

With thanks to Simon Minto, Colin Benjamin, Ben Briggs, Kim Murphy-Stewart, Michelle Edgerley, Victoria Wray, Chris Cormack (Kaihuawaere Matihiko, Catalyst IT), Karen McNamara, and Thomas Andrews, for peer reviewing.

,

Francois MarierSelf-hosting an Ikiwiki blog

8.5 years ago, I moved my blog to Ikiwiki and Branchable. It's now time for me to take the next step and host my blog on my own server. This is how I migrated from Branchable to my own Apache server.

Installing Ikiwiki dependencies

Here are all of the extra Debian packages I had to install on my server:

apt install ikiwiki ikiwiki-hosting-common gcc libauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-sslauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-ssleay-perl libjson-xs-perl librpc-xml-perl python-docutils libxml-feed-perl libsearch-xapian-perl libmailtools-perl highlight-common libsearch-xapian-perl xapian-omega
apt install --no-install-recommends ikiwiki-hosting-web libgravatar-url-perl libmail-sendmail-perl libcgi-session-perl
apt purge libnet-openid-consumer-perl

Then I enabled the CGI module in Apache:

a2enmod cgi

and un-commented the following in /etc/apache2/mods-available/mime.conf:

AddHandler cgi-script .cgi

Creating a separate user account

Since Ikiwiki needs to regenerate my blog whenever a new article is pushed to the git repo or a comment is accepted, I created a restricted user account for it:

adduser blog
adduser blog sshuser
chsh -s /usr/bin/git-shell blog

git setup

Thanks to Branchable storing blogs in git repositories, I was able to import my blog using a simple git clone in /home/blog (the srcdir):

git clone --bare git://feedingthecloud.branchable.com/ source.git

Note that the name of the directory (source.git) is important for the ikiwikihosting plugin to work.

Then I pulled the .setup file out of the setup branch in that repo and put it in /home/blog/.ikiwiki/FeedingTheCloud.setup. After that, I deleted the setup branch and the origin remote from that clone:

git branch -d setup
git remote rm origin

Following the recommended git configuration, I created a working directory (the repository) for the blog user to modify the blog as needed:

cd /home/blog/
git clone /home/blog/source.git FeedingTheCloud

I added my own ssh public key to /home/blog/.ssh/authorized_keys so that I could push to the srcdir from my laptop.

Finaly, I generated a new ssh key without a passphrase:

ssh-keygen -t ed25519

and added it as deploy key to the GitHub repo which acts as a read-only mirror of my blog.

Ikiwiki config

While I started with the Branchable setup file, I changed the following things in it:

adminemail: webmaster@fmarier.org
srcdir: /home/blog/FeedingTheCloud
destdir: /var/www/blog
url: https://feeding.cloud.geek.nz
cgiurl: https://feeding.cloud.geek.nz/blog.cgi
cgi_wrapper: /var/www/blog/blog.cgi
cgi_wrappermode: 675
add_plugins:
- goodstuff
- lockedit
- comments
- blogspam
- sidebar
- attachment
- favicon
- format
- highlight
- search
- theme
- moderatedcomments
- flattr
- calendar
- headinganchors
- notifyemail
- anonok
- autoindex
- date
- relativedate
- htmlbalance
- pagestats
- sortnaturally
- ikiwikihosting
- gitpush
- emailauth
disable_plugins:
- brokenlinks
- fortune
- more
- openid
- orphans
- passwordauth
- progress
- recentchanges
- repolist
- toggle
- txt
sslcookie: 1
cookiejar:
  file: /home/blog/.ikiwiki/cookies
useragent: ikiwiki
git_wrapper: /home/blog/source.git/hooks/post-update
urlalias:
- http://feeds.cloud.geek.nz/
- http://www.feeding.cloud.geek.nz/
owner: francois@fmarier.org
hostname: feeding.cloud.geek.nz
emailauth_sender: login@fmarier.org
allowed_attachments: admin()

Then I created the destdir:

mkdir /var/www/blog
chown blog:blog /var/www/blog

and generated the initial copy of the blog as the blog user:

ikiwiki --setup .ikiwiki/FeedingTheCloud.setup --wrappers --rebuild

One thing that failed to generate properly was the tag cloug (from the pagestats plugin). I have not been able to figure out why it fails to generate any output when run this way, but if I push to the repo and let the git hook handle the rebuilding of the wiki, the tag cloud is generated correctly. Consequently, fixing this is not high on my list of priorities, but if you happen to know what the problem is, please reach out.

Apache config

Here's the Apache config I put in /etc/apache2/sites-available/blog.conf:

<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz

    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem

    Header set Strict-Transport-Security: "max-age=63072000; includeSubDomains; preload"

    Include /etc/fmarier-org/blog-common
</VirtualHost>

<VirtualHost *:443>
    ServerName www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz

    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem

    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>

<VirtualHost *:80>
    ServerName feeding.cloud.geek.nz
    ServerAlias www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz

    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>

and the common config I put in /etc/fmarier-org/blog-common:

ServerAdmin webmaster@fmarier.org

DocumentRoot /var/www/blog

LogLevel core:info
CustomLog ${APACHE_LOG_DIR}/blog-access.log combined
ErrorLog ${APACHE_LOG_DIR}/blog-error.log

AddType application/rss+xml .rss

<Location /blog.cgi>
        Options +ExecCGI
</Location>

before enabling all of this using:

a2ensite blog
apache2ctl configtest
systemctl restart apache2.service

The feeds.cloud.geek.nz domain used to be pointing to Feedburner and so I need to maintain it in order to avoid breaking RSS feeds from folks who added my blog to their reader a long time ago.

Server-side improvements

Since I'm now in control of the server configuration, I was able to make several improvements to how my blog is served.

First of all, I enabled the HTTP/2 and Brotli modules:

a2enmod http2
a2enmod brotli

and enabled Brotli compression by putting the following in /etc/apache2/conf-available/compression.conf:

<IfModule mod_brotli.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION BROTLI_COMPRESS
  </IfDefine>
</IfModule>
<IfModule mod_deflate.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION DEFLATE
  </IfDefine>
</IfModule>
<IfDefine TRANSFER_COMPRESSION>
  <IfModule mod_filter.c>
    AddOutputFilterByType ${TRANSFER_COMPRESSION} text/html text/plain text/xml text/css text/javascript
    AddOutputFilterByType ${TRANSFER_COMPRESSION} application/x-javascript application/javascript application/ecmascript
    AddOutputFilterByType ${TRANSFER_COMPRESSION} application/rss+xml
    AddOutputFilterByType ${TRANSFER_COMPRESSION} application/xml
  </IfModule>
</IfDefine>

and replacing /etc/apache2/mods-available/deflate.conf with the following:

# Moved to /etc/apache2/conf-available/compression.conf as per https://bugs.debian.org/972632

before enabling this new config:

a2enconf compression

Next, I made my blog available as a Tor onion service by putting the following in /etc/apache2/sites-available/blog.conf:

<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz
    ServerAlias xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion

    Header set Onion-Location "http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion%{REQUEST_URI}s"
    Header set alt-svc 'h2="xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion:443"; ma=315360000; persist=1'
    ... 

<VirtualHost *:80>
    ServerName xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion
    Include /etc/fmarier-org/blog-common
</VirtualHost>

Then I followed the Mozilla Observatory recommendations and enabled the following security headers:

Header set Content-Security-Policy: "default-src 'none'; report-uri https://fmarier.report-uri.com/r/d/csp/enforce ; style-src 'self' 'unsafe-inline' ; img-src 'self' https://seccdn.libravatar.org/ ; script-src https://feeding.cloud.geek.nz/ikiwiki/ https://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ 'unsafe-inline' 'sha256-pA8FbKo4pYLWPDH2YMPqcPMBzbjH/RYj0HlNAHYoYT0=' 'sha256-Kn5E/7OLXYSq+EKMhEBGJMyU6bREA9E8Av9FjqbpGKk=' 'sha256-/BTNlczeBxXOoPvhwvE1ftmxwg9z+WIBJtpk3qe7Pqo=' ; base-uri 'self'; form-action 'self' ; frame-ancestors 'self'"
Header set X-Frame-Options: "SAMEORIGIN"
Header set Referrer-Policy: "same-origin"
Header set X-Content-Type-Options: "nosniff"

Note that the Mozilla Observatory is mistakenly identifying HTTP onion services as insecure, so you can ignore that failure.

I also used the Mozilla TLS config generator to improve the TLS config for my server.

Then I added security.txt and gpc.json to the root of my git repo and then added the following aliases to put these files in the right place:

Alias /.well-known/gpc.json /var/www/blog/gpc.json
Alias /.well-known/security.txt /var/www/blog/security.txt

I also followed these instructions to create a sitemap for my blog with the following alias:

Alias /sitemap.xml /var/www/blog/sitemap/index.rss

Finally, I simplified a few error pages to save bandwidth:

ErrorDocument 301 " "
ErrorDocument 302 " "
ErrorDocument 404 "Not Found"

Monitoring 404s

Another advantage of running my own web server is that I can monitor the 404s easily using logcheck by putting the following in /etc/logcheck/logcheck.logfiles:

/var/log/apache2/blog-error.log 

Based on that, I added a few redirects to point bots and users to the location of my RSS feed:

Redirect permanent /atom /index.atom
Redirect permanent /comments.rss /comments/index.rss
Redirect permanent /comments.atom /comments/index.atom
Redirect permanent /FeedingTheCloud /index.rss
Redirect permanent /feed /index.rss
Redirect permanent /feed/ /index.rss
Redirect permanent /feeds/posts/default /index.rss
Redirect permanent /rss /index.rss
Redirect permanent /rss/ /index.rss

and to tell them to stop trying to fetch obsolete resources:

Redirect gone /~ff/FeedingTheCloud
Redirect gone /gittip_button.png
Redirect gone /ikiwiki.cgi

I also used these 404s to discover a few old Feedburner URLs that I could redirect to the right place using archive.org:

Redirect permanent /feeds/1572545745827565861/comments/default /posts/watch-all-of-your-logs-using-monkeytail/comments.atom
Redirect permanent /feeds/1582328597404141220/comments/default /posts/news-feeds-rssatom-for-mythtvorg-and/comments.atom
...
Redirect permanent /feeds/8490436852808833136/comments/default /posts/recovering-lost-git-commits/comments.atom
Redirect permanent /feeds/963415010433858516/comments/default /posts/debugging-openwrt-routers-by-shipping/comments.atom

I also put the following robots.txt in the git repo in order to stop a bunch of authentication errors coming from crawlers:

User-agent: *
Disallow: /blog.cgi
Disallow: /ikiwiki.cgi

Future improvements

There are a few things I'd like to improve on my current setup.

The first one is to remove the iwikihosting and gitpush plugins and replace them with a small script which would simply git push to the read-only GitHub mirror. Then I could uninstall the ikiwiki-hosting-common and ikiwiki-hosting-web since that's all I use them for.

Next, I would like to have proper support for signed git pushes. At the moment, I have the following in /home/blog/source.git/config:

[receive]
    advertisePushOptions = true
    certNonceSeed = "(random string)"

but I'd like to also reject unsigned pushes.

While my blog now has a CSP policy which doesn't rely on unsafe-inline for scripts, it does rely on unsafe-inline for stylesheets. I tried to remove this but the actual calls to allow seemed to be located deep within jQuery and so I gave up. Patches for this would be very welcome of course.

Finally, I'd like to figure out a good way to deal with articles which don't currently have comments. At the moment, if you try to subscribe to their comment feed, it returns a 404. For example:

[Sun Jun 06 17:43:12.336350 2021] [core:info] [pid 30591:tid 140253834704640] [client 66.249.66.70:57381] AH00128: File does not exist: /var/www/blog/posts/using-iptables-with-network-manager/comments.atom

This is obviously not ideal since many feed readers will refuse to add a feed which is currently not found even though it could become real in the future. If you know of a way to fix this, please let me know.

,

Francois MarierOpting your domain out of programmatic advertising

A few years ago, the advertising industry introduced the ads.txt project in order to defend against widespread domain spoofing vulnerabilities in programmatic advertising.

I decided to use this technology to opt out of having ads sold for my domains, at least through ad exchanges which perform this check, by hosting a text file containing this:

contact=ads@fmarier.org

at the following locations:

(In order to get this to work on my blog, running Ikiwiki on Branchable, I had to disable the txt plugin in order to get ads.txt to be served as a plain text file instead of being automatically rendered as HTML.)

Specification

The key parts of the specification for our purposes are:

[3.1] If the server response indicates the resource does not exist (HTTP Status Code 404), the advertising system can assume no declarations exist and that no advertising system is unauthorized to buy and sell ads on the website.

[3.2.1] Some publishers may choose to not authorize any advertising system by publishing an empty ads.txt file, indicating that no advertising system is authorized to buy and sell ads on the website. So that consuming systems properly read and interpret the empty file (differentiating between web servers returning error pages for the /ads.txt URL), at least one properly formatted line must be included which adheres to the format specification described above.

As you can see, the specification sadly ignores RFC8615 and requires that the ads.txt file be present directly in the root of your web server, like the venerable robots.txt file, but unlike the newer security.txt standard.

If you don't want to provide an email address in your ads.txt file, the specification recommends using the following line verbatim:

placeholder.example.com, placeholder, DIRECT, placeholder

Validation

A number of online validators exist, but I used the following to double-check my setup:

,

Arjen LentzClassic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The post Classic McEleice and the NIST search for post-quantum crypto first appeared on Lentz family blog.

,

Francois MarierUsing a Streamzap remote control with Kodi

After installing Kodi on a Raspberry Pi 4, I found that my Streamzap remote control worked for everything except the Ok and Exit buttons (which are supposed to get mapped to Enter and Back respectively).

A very old set of instructions for this is archived on the Kodi wiki but here's a more modern version of it.

Root cause

I finally tracked down the problem by enabling debug logging in Kodi settings. I saw the following in ~/.kodi/temp/kodi.log when presing the OK button:

DEBUG: Keyboard: scancode: 0x00, sym: 0x0000, unicode: 0x0000, modifier: 0x0
DEBUG: GetActionCode: Trying Hardy keycode for 0xf200
DEBUG: Previous line repeats 3 times.
DEBUG: HandleKey: long-0 (0x100f200, obc-16838913) pressed, action is
DEBUG: Keyboard: scancode: 0x00, sym: 0x0000, unicode: 0x0000, modifier: 0x0

and this when pressing the Down button:

DEBUG: CLibInputKeyboard::ProcessKey - using delay: 500ms repeat: 125ms
DEBUG: Thread Timer start, auto delete: false
DEBUG: Keyboard: scancode: 0x6c, sym: 0x0112, unicode: 0x0000, modifier: 0x0
DEBUG: HandleKey: down (0xf081) pressed, action is Down
DEBUG: Thread Timer 2502349008 terminating
DEBUG: Keyboard: scancode: 0x6c, sym: 0x0112, unicode: 0x0000, modifier: 0x0

This suggests that my Streamzap remote is recognized as a keyboard, which I can confirm using:

$ cat /proc/bus/input/devices 
I: Bus=0003 Vendor=0e9c Product=0000 Version=0100
N: Name="Streamzap PC Remote Infrared Receiver (0e9c:0000)"
P: Phys=usb-0000:01:00.0-1.2/input0
S: Sysfs=/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.2/1-1.2:1.0/rc/rc0/input4
U: Uniq=
H: Handlers=kbd event0 
B: PROP=20
B: EV=100017
B: KEY=3ff 0 0 0 fc000 1 0 0 0 0 18000 4180 c0000801 9e1680 0 0 0
B: REL=3
B: MSC=10

Installing LIRC

The fix I found is to put the following in /etc/X11/xorg.conf.d/90-streamzap-disable.conf:

Section "InputClass"
    Identifier "Ignore Streamzap IR"
    MatchProduct "Streamzap"
    MatchIsKeyboard "true"
    Option "Ignore" "true"
EndSection

to prevent the remote from being used as a keyboard and to instead use it via LIRC, which can be installed like this:

apt install lirc

Put the following in /etc/lirc/lirc_options:

driver=default
device=/dev/lirc0

and install this remote configuration as /etc/lirc/lircd.conf.d/streamzap.conf:

cd /etc/lirc/lircd.conf.d/
curl https://raw.githubusercontent.com/graysky2/streamzap/master/00-Streamzap_PC_Remote.conf > streamzap.conf

Make sure you don't use the config file that comes with the lirc-compat-remotes package or you will likely end up with an over-sensitive remote which tends to double key presses (e.g. pressing the down arrow will go down more than once).

Testing

Now you should be able to test the remote using:

mode2

to see the undecoded infra-red signal, and:

irw

to display the decoded key presses.

Kodi configuration

Finally, as the pi user, put the following config in ~/.kodi/userdata/Lircmap.xml:

<lircmap>
  <remote device="Streamzap_PC_Remote">
    <power>KEY_POWER</power>
    <play>KEY_PLAY</play>
    <pause>KEY_PAUSE</pause>
    <stop>KEY_STOP</stop>
    <forward>KEY_FORWARD</forward>
    <reverse>KEY_REWIND</reverse>
    <left>KEY_LEFT</left>
    <right>KEY_RIGHT</right>
    <up>KEY_UP</up>
    <down>KEY_DOWN</down>
    <pageplus>KEY_CHANNELUP</pageplus>
    <pageminus>KEY_CHANNELDOWN</pageminus>
    <select>KEY_OK</select>
    <back>KEY_EXIT</back>
    <menu>KEY_MENU</menu>
    <red>KEY_RED</red>
    <green>KEY_GREEN</green>
    <yellow>KEY_YELLOW</yellow>
    <blue>KEY_BLUE</blue>
    <skipplus>KEY_NEXT</skipplus>
    <skipminus>KEY_PREVIOUS</skipminus>
    <record>KEY_RECORD</record>
    <volumeplus>KEY_VOLUMEUP</volumeplus>
    <volumeminus>KEY_VOLUMEDOWN</volumeminus>
    <mute>KEY_MUTE</mute>
    <record>KEY_RECORD</record>
    <one>KEY_1</one>
    <two>KEY_2</two>
    <three>KEY_3</three>
    <four>KEY_4</four>
    <five>KEY_5</five>
    <six>KEY_6</six>
    <seven>KEY_7</seven>
    <eight>KEY_8</eight>
    <nine>KEY_9</nine>
    <zero>KEY_0</zero>
  </remote>
</lircmap>

In order for all of this to take effect, I simply rebooted the Pi:

sudo systemctl reboot

Francois MarierUpgrading an ext4 filesystem for the year 2038

If you see a message like this in your logs:

ext4 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)

it's an indication that your filesystem is not Y2k38-safe.

You can also check this manually using:

$ tune2fs -l /dev/sda1 | grep "Inode size:"
Inode size:           128

where an inode size of 128 is insufficient beyond 2038 and an inode size of 256 is what you want.

The safest way to change this is to copy the contents of your partition to another ext4 partition:

cp -a /boot /mnt/backup/

and then reformat with the correct inode size:

umount /boot
mkfs.ext4 -I 256 /dev/sda1

before copying everything back:

mount /boot
cp -a /mnt/backup/boot/* /boot/

Francois MarierRemoving a corrupted data pack in a Restic backup

I recently ran into a corrupted data pack in a Restic backup on my GnuBee. It led to consistent failures during the prune operation:

incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
hash does not match id: want 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5, got 2818331716e8a5dd64a610d1a4f85c970fd8ae92f891d64625beaaa6072e1b84
github.com/restic/restic/internal/repository.Repack
        github.com/restic/restic/internal/repository/repack.go:37
main.pruneRepository
        github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
        github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
        github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra/command.go:838
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra/command.go:943
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra/command.go:883
main.main
        github.com/restic/restic/cmd/restic/main.go:86
runtime.main
        runtime/proc.go:204
runtime.goexit
        runtime/asm_amd64.s:1374

Thanks to the excellent support forum, I was able to resolve this issue by dropping a single snapshot.

First, I identified the snapshot which contained the offending pack:

$ restic -r sftp:hostname.local: find --pack 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5
repository b0b0516c opened successfully, password is correct
Found blob 2beffa460d4e8ca4ee6bf56df279d1a858824f5cf6edc41a394499510aa5af9e
 ... in file /home/francois/.local/share/akregator/Archive/http___udd.debian.org_dmd_feed_
     (tree 602b373abedca01f0b007fea17aa5ad2c8f4d11f1786dd06574068bf41e32020)
 ... in snapshot 5535dc9d (2020-06-30 08:34:41)

Then, I could simply drop that snapshot:

$ restic -r sftp:hostname.local: forget 5535dc9d
repository b0b0516c opened successfully, password is correct
[0:00] 100.00%  1 / 1 files deleted

and run the prune command to remove the snapshot, as well as the incomplete packs that were also mentioned in the above output but could never be removed due to the other error:

$ restic -r sftp:hostname.local: prune
repository b0b0516c opened successfully, password is correct
counting files in repo
building new index for repo
[20:11] 100.00%  77439 / 77439 packs
incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
repository contains 77434 packs (2384522 blobs) with 367.648 GiB
processed 2384522 blobs: 1165510 duplicate blobs, 47.331 GiB duplicate
load all snapshots
find data that is still in use for 15 snapshots
[1:11] 100.00%  15 / 15 snapshots
found 1006062 of 2384522 data blobs still in use, removing 1378460 blobs
will remove 5 invalid files
will delete 13728 packs and rewrite 15140 packs, this frees 142.285 GiB
[4:58:20] 100.00%  15140 / 15140 packs rewritten
counting files in repo
[18:58] 100.00%  50164 / 50164 packs
finding old index files
saved new indexes as [340cb68f 91ff77ef ee21a086 3e5fa853 084b5d4b 3b8d5b7a d5c385b4 5eff0be3 2cebb212 5e0d9244 29a36849 8251dcee 85db6fa2 29ed23f6 fb306aba 6ee289eb 0a74829d]
remove 190 old index files
[0:00] 100.00%  190 / 190 files deleted
remove 28868 old packs
[1:23] 100.00%  28868 / 28868 files deleted
done

Recovering from a corrupt pack

I ran into this problem a second time:

hash does not match id: want 4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176, got 4f1eeb7ed1423358d4c579e641fe40a6340565c64f2df96ac19d28714a769806

but in that case dropping a single snapshot is not an option because the invalid pack is used in every snapshot!

Digging more into this problem, I realized I could trigger the error by requesting this pack specifically:

$ restic -r sftp:hostname.local: cat pack 4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176 | sha256sum
Warning: hash of data does not match ID, want
  4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176
got:
  4f1eeb7ed1423358d4c579e641fe40a6340565c64f2df96ac19d28714a769806
4f1eeb7ed1423358d4c579e641fe40a6340565c64f2df96ac19d28714a769806  -

If I ssh into my backup server and look at the pack file directly, I can see that the contents of it do not match its name (the name is supposed to be the hash of the contents):

$ sha256sum /mnt/data/home/machine1/data/4f/4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176
4f1eeb7ed1423358d4c579e641fe40a6340565c64f2df96ac19d28714a769806  /mnt/data/home/machine1/data/4f/4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176

After looking at the underlying file that is involved in this data corruption:

$ restic -r sftp:hostname.local: find --pack 4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176
Found blob aec3a1b637a09f173652d509c1f39b8248deeba83d6c74ca99749676d7a4fb75
 ... in file /mnt/pub/video.mkv
     (tree 7b83ceeb0cadb35a6e1103ebb37152e9b1338469fbc782256b95c0ddb6d4cc4e)
 ... in snapshot b0c7a69f (2021-04-16 09:41:47)

I decided that I didn't care about potentially losing it and worked around the invalid filename by renaming the pack (on the backup server) to its current hash:

$ cd /mnt/data/home/machine1/data/4f/
$ mv 4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176 4f1eeb7ed1423358d4c579e641fe40a6340565c64f2df96ac19d28714a769806

That didn't work and made restic prune fail.

Since I still have the original video file, I figured I could just delete the pack and then trigger a new backup to re-upload it:

$ cd /mnt/data/home/machine1/data/4f/
$ rm 4f0d26ae93d48ae9a274b0802c208fa47dcc2f97393378d63c208dd8dbcdf176

$ restic -r sftp:hostname.local: backup --force
repository 1e424ff8 opened successfully, password is correct

Files:       620161 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 392.716 MiB

processed 620161 files, 1.637 TiB in 2:02:23
snapshot 710af622 saved

After doing that, I was able to run restic prune successfully and a simple check passed:

$ restic -r sftp:hostname.local: check
repository 1e424ff8 opened successfully, password is correct
created new cache in /var/tmp/restic-check-tmp/restic-check-cache-878723178
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
no errors were found

but a more thorough check exposed another corrupt pack:

$ restic -r sftp:hostname.local: check
using temporary cache in /var/tmp/restic-check-tmp/restic-check-cache-878723178
repository 1e424ff8 opened successfully, password is correct
created new cache in /var/tmp/restic-check-tmp/restic-check-cache-878723178
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
no errors were found

$ restic -r sftp:hostname.local: check --read-data
using temporary cache in /var/tmp/restic-check-tmp/restic-check-cache-502831432
repository 1e424ff8 opened successfully, password is correct
created new cache in /var/tmp/restic-check-tmp/restic-check-cache-502831432
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
Pack ID does not match, want b6c9bd10, got 89dc2d85
[9:30:09] 100.00%  385573 / 385573 items
duration: 9:30:09
Fatal: repository contains errors

I removed that one too:

$ cd /mnt/data/home/machine1/data/b6/
$ rm b6c9bd10c68347d6bb76328d4bcb4b07c5dbf4f0b9317a4268844ea6c8b0b179

$ restic -r sftp:hostname.local: backup --force
repository 1e424ff8 opened successfully, password is correct

Files:       620209 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 12.156 GiB

processed 620209 files, 1.655 TiB in 2:10:10
snapshot b63f6890 saved

but this time pruning didn't work:

$ restic -r sftp:hostname.local: check --read-data
using temporary cache in /var/tmp/restic-check-tmp/restic-check-cache-315342197
repository 1e424ff8 opened successfully, password is correct
created new cache in /var/tmp/restic-check-tmp/restic-check-cache-315342197
create exclusive lock for repository
load indexes
check all packs
pack b6c9bd10: does not exist
check snapshots, trees and blobs
read all data
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 458.926983ms: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 952.879588ms: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 1.512969815s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 1.345233563s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 2.502610537s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 2.391042907s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 4.49847416s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 4.678015434s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 19.187660462s: file does not exist
Load(<data/b6c9bd10c6>, 0, 0) returned error, retrying after 14.884802036s: file does not exist
checkPack: Load: file does not exist
[9:36:58] 100.00%  388034 / 388034 items
duration: 9:36:58
Fatal: repository contains errors

$ restic -r sftp:hostname.local: prune
repository 1e424ff8 opened successfully, password is correct
counting files in repo
building new index for repo
[1:42:28] 100.00%  388033 / 388033 packs
repository contains 388033 packs (2638525 blobs) with 1.838 TiB
processed 2638525 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 23 snapshots
[1:38] 100.00%  23 / 23 snapshots
Fatal: number of used blobs is larger than number of available blobs!
Please report this error (along with the output of the 'prune' run) at
https://github.com/restic/restic/issues/new

So, as suggested on the restic issue tracker, I rebuilt the index:

$ restic -r sftp:hostname.local: rebuid-index
repository 1e424ff8 opened successfully, password is correct
counting files in repo
[44:35] 100.00%  395104 / 395104 packs
finding old index files
saved new indexes as [9a17c926 933252b2 2ef59ca2 679d4819 2661f12e 771b2013 d532cd44 55b4f8ab 5a3e6b2a 7b75e860 13c15b59 60d3266e 1fbe4dc5 f3b2396a c9debfa0 5ea78678 4e74f3ef bb9e99d9 9e335e0a 700a625d db9177b7 0a80aa31 9cbc436e 33211891 bfa29354 b67bc0f9 dfdda084 bc92aba3 c1dff125 652c4185 8c010d90 5c87bbfc fc4ba6f1 c0a7396e c54917ec 6aa26645 691cf977 f91223d3 cf9ff525 c1441550 ef71cb97 978192b2 40b416e9 31373aae 01bcade9 6f8b98bb 3e15ba5f d62b68c7 c92c5277 4270bae1 96822d59 cd45d864 d7830dad d6eae2a0 68cc1f1c 8501c6b7 fb95ce78 50479d33 e3afbbc9 f83fbfb6 097fd285 4d1ad340 b2a9b7ce 80add534 9cee064d 1cc5bbe3 7d40526b 334aecc9 952f8ed5 89d830b2 89e0c097 f3a3abf6 3c88ac9e c8cae3e5 1dc35cd3 3d70ba93 c59da9ee 2c19a371 5eed964c 1191bfee fa0e0c31 f79af9bb 214916f7 55cd25c4 03c5550f 8c36e374 7b0d2307 5c1c089a d62c5d18 d5c19d53 2e5f8647 27b5e5cf 87e164ff 0c9a7374 b2fa9e91 8781539c 1f130615 92396e28 1b630ee2 ecdb6b3c aae6de03 65fe2b9d 245d0258 6c446e2f 4f53e49e c6dba856 129c066d af0cb9e4 562adf6d eec351a0 6b9d2810 8d2b1aaf 69700877 4adff4e4 3ad2f960 79c8c084 fdccfa7a a0a1ef23 6ec37846 0b4f2199 8c2dc492 c304aeef d2fb56f3 30a0b5dd 20b01d0b c5bd22ad e96e0e72 73d39d25 57329b10 e9cfb1ef 685611e6 ce291336]
remove 132 old index files

and performed a forced backup:

$ restic -r sftp:hostname.local: backup --force
repository 1e424ff8 opened successfully, password is correct

Files:       620136 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 1.034 GiB

processed 620136 files, 1.612 TiB in 2:00:40
snapshot a15ee906 saved

followed by a prune:

$ restic -r sftp:hostname.local: prune
repository 1e424ff8 opened successfully, password is correct
counting files in repo
building new index for repo
[48:19] 100.00%  395322 / 395322 packs
repository contains 395322 packs (2664883 blobs) with 1.873 TiB
processed 2664883 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 25 snapshots
[1:40] 100.00%  25 / 25 snapshots
found 2664883 of 2664883 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0 B
counting files in repo
[10:14] 100.00%  395322 / 395322 packs
finding old index files
saved new indexes as [3304a0b5 040cac48 c38f9c4d 1619862d 2c314f95 c9a1c845 3f836327 4f2830a5 9e9a9806 3040f9bd 9e1b0589 146f395c 27d91277 08616709 38757c70 15093193 103ba3e1 e6cbe636 55fcbe20 92c12833 8a212821 a278e6eb 467328be 4b29f8c4 934de2a2 12d3f314 01c08b76 7101de54 afc8979b 18395477 48d140ef 48b48906 ee339513 742eea48 0d54eeb5 7dfbad62 54bdd724 33511da3 8b2628fd d566581d 0272fbf6 0510755d 64a39724 d4a51c31 432b0019 45469c45 c28d99fc c2329a41 c4dc9880 d52fdb9e 00b2c975 d791f871 c5e37a0e 1c3f7f40 008dda03 70cce88c e2bdef19 980839da a8388aad 9f2fdfc7 3f81fb07 20ec25f3 f3b0e273 8adfe021 5cb21bf4 f9664745 e81e0dc1 bdf674b3 4b43311f f576ee7b 6ff78a24 050d0d7a 079dda00 5a92ee95 c7a73677 42269868 c54be3e7 96a8339d 4fac763a aa859ad2 0282a555 d8145fdc 7fb9cae8 0b88bb4f 4ba5cdf5 6f4bc3ad 040580df fd5e0594 fe642426 58839033 0044dba1 73369fba d1d574aa 97833dff 469993c4 d6e89ba7 1e7378d3 3c5ce4a8 634c33b3 85db6047 2902f128 5c874b86 fa13fa7a 0e3319d6 7b5e5b57 41f864cf 72646b83 f3e19e87 cb28793a e2d3d8d5 42d7fac3 e5cc116f 5cb8d048 e5b54e10 1d3c59eb a5343bd8 9a11b3d9 5d607d3a c64ae7d5 84043ae5 ab28156d dbc777ba 870a4333 7bd74995 e21d054d 1983d216 c89603bd e767acff 2eb43682 44f44f58 9505e382 bb180450]
remove 136 old index files
done

That finally resolved the problems:

$ restic -r sftp:hostname.local: check --read-data
using temporary cache in /var/tmp/restic-check-tmp/restic-check-cache-120208924
repository 1e424ff8 opened successfully, password is correct
created new cache in /var/tmp/restic-check-tmp/restic-check-cache-120208924
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
[10:06:21] 100.00%  395322 / 395322 items
duration: 10:06:21
no errors were found

$ restic -r sftp:hostname.local: prune
repository 1e424ff8 opened successfully, password is correct
counting files in repo
building new index for repo
[1:46:03] 100.00%  395322 / 395322 packs
repository contains 395322 packs (2664883 blobs) with 1.873 TiB
processed 2664883 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 25 snapshots
[1:40] 100.00%  25 / 25 snapshots
found 2664883 of 2664883 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0 B
counting files in repo
[51:25] 100.00%  395322 / 395322 packs
finding old index files
saved new indexes as [625949a2 ab90eb8e 14b9628f b167409c f7ec7fcd 6b4c8610 bbe52582 71bfc8a6 3cba9ff0 1c7bde59 a7964e22 06b80d66 14c809a9 7b86e517 d99947de 2210743d 4491e925 bce9c547 b6f6d858 8c5971a1 55e33c29 f7ffa1d4 4cc2840d ce1abd72 b060fa02 d2fd09ff f4a0131b 374a9189 2dccd29a 1945cd5f c7ee7206 3d04a42d f438b047 452db6f1 27c963d7 f58fbd51 96b1f61e a9a0973d ba4d6703 e678c54c 51302c74 afbb4bac 8751b54e 94473541 d9b32531 60df7481 8106fd0c 61253b21 2e875c92 7df95dd3 0adf9c74 88170e77 91792c60 0ea4e491 39b94739 11c1fe4b 43b126ca cc735f84 27b9b442 081e8ab0 48e93444 1b0fa84c ffbfd282 838e6e98 84a12734 c2e4b6cf 6e175d0a 43998700 eb0a3082 05a82e0d 0ebca5a7 8a0007ce 2c726d2f 04b69cc9 f6ba1902 76e56673 9b0deb48 5a847d01 d0aaa74b 20fe5986 5467b907 b6f5bafc 53117b38 ae52eb33 fa327a6e 219a650c 37f52f15 8d15e4d8 b3c30a8c 38b23a1f 1a8c3cc5 2d7f8cb5 d051dae6 40e95005 a37e5883 aa541d2f de31889c 55ce512c 9c754673 aa1e62f6 8c11b17e 98c21930 c4425cef 295f092b daa08873 71c296e4 7ca9c531 43d3ead3 e397f19a 963e4a9e 531fd7c4 e68f7ee8 6e1c2dd8 8ee249fc 02068c0f 8ffe124f 6b2469bd 92aae0d2 464e119f b49edd74 feddbd73 a2bfe6ea cca616ce f30e6c1d 0614837e 120477e1 d39fe0f2 52ed1a31 6343369d 362e7a30 9039009b f6499019]
remove 132 old index files
done

In other words, if dropping a single snapshot is not an option and you still have the original file, then try deleting the corrupt pack and then running:

restic rebuild-index
restic backup --force
restic prune

Note that if you are using newer version of restic which automatically heals repos (i.e. 0.12.0 or later), you may not need to use the --force option with the backup command.

Francois MarierRecovering from a corrupt MariaDB index page

I ran into a corrupt MariaDB index page the other day and had to restore my MythTV database from the automatic backups I make as part of my regular maintainance tasks.

Signs of trouble

My troubles started when my daily backup failed on this line:

mysqldump --opt mythconverg -umythtv -pPASSWORD > mythconverg-200200923T1117.sql

with this error message:

mysqldump: Error 1034: Index for table 'recordedseek' is corrupt; try to repair it when dumping table `recordedseek` at row: 4059895

Comparing the dump that was just created to the database dumps in /var/backups/mythtv/, it was clear that it was incomplete since it was about 100 MB smaller.

I first tried a gentle OPTIMIZE TABLE recordedseek as suggested in this StackExchange answer but that caused the database to segfault:

mysqld[9141]: 2020-09-23 15:02:46 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace mythconverg/recordedseek page [page id: space=115871, page number=11373]. You may have to recover from a backup.
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
mysqld[9141]:  len 16384; hex 06177fa70000...
mysqld[9141]:  C     K     c      {\;
mysqld[9141]: InnoDB: End of page dump
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 102203303, calculated checksums for field1: crc32 806650270, innodb 1139779342,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 102203303, calculated checksums for field2: crc32 806650270, innodb 3322209073, none 3735928559,  page LSN 148 2450029404, low 4 bytes of LSN at page end 2450029404, page number (if stored to page already) 11373, space id (if created with >= MySQL-4.1.1 and stored already) 115871
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page may be an index page where index id is 697207
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Index 697207 is `PRIMARY` in table `mythconverg`.`recordedseek`
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
mysqld[9141]: 200923 15:02:46 2020-09-23 15:02:46 0 [ERROR] InnoDB: Failed to read file './mythconverg/recordedseek.ibd' at offset 11373: Page read from tablespace is corrupted.
mysqld[9141]: [ERROR] mysqld got signal 11 ;
mysqld[9141]: Core pattern: |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h ...
kernel: [820233.893658] mysqld[9186]: segfault at 90 ip 0000557a229f6d90 sp 00007f69e82e2dc0 error 4 in mysqld[557a224ef000+803000]
kernel: [820233.893665] Code: c4 20 83 bd e4 eb ff ff 44 48 89 ...
systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
systemd[1]: mariadb.service: Failed with result 'signal'.
systemd-coredump[9240]: Process 9141 (mysqld) of user 107 dumped core.#012#012Stack trace of thread 9186: ...
systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
mysqld[9260]: 2020-09-23 15:02:52 0 [Warning] Could not increase number of max_open_files to more than 16364 (request: 32186)
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=638234502026
...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Recovered page [page id: space=115875, page number=5363] from the doublewrite buffer.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting final batch to recover 2 pages from redo log.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Waiting for purge to start
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Recovering after a crash using tc.log
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Starting crash recovery...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Crash recovery finished.

and so I went with the nuclear option of dropping the MythTV database and restoring from backup.

Dropping the corrupt database

First of all, I shut down MythTV as root:

killall mythfrontend
systemctl stop mythtv-status.service
systemctl stop mythtv-backend.service

and took a full copy of my MariaDB databases just in case:

systemctl stop mariadb.service
cd /var/lib
apack /root/var-lib-mysql-20200923T1215.tgz mysql/
systemctl start mariadb.service

before dropping the MythTV databse (mythconverg):

$ mysql -pPASSWORD

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| mythconverg        |
| performance_schema |
+--------------------+
4 rows in set (0.000 sec)

MariaDB [(none)]> drop database mythconverg;
Query OK, 114 rows affected (25.564 sec)

MariaDB [(none)]> quit
Bye

Restoring from backup

Then I re-created an empty database:

mysql -pPASSWORD < /usr/share/mythtv/sql/mc.sql

and restored the last DB dump prior to the detection of the corruption:

sudo -i -u mythtv
/usr/share/mythtv/mythconverg_restore.pl --directory /var/backups/mythtv --filename mythconverg-1350-20200923010502.sql.gz

In order to restart everything properly, I simply rebooted the machine:

systemctl reboot

Francois MarierProgramming a DMR radio with its CPS

Here are some notes I took around programming my AnyTone AT-D878UV radio to operate on DMR using the CPS software that comes with it.

Note that you can always tune in to a VFO channel by hand if you haven't had time to add it to your codeplug yet.

DMR terminology

First of all, the terminology of DMR is quite different from that of the regular analog FM world.

Here are the basic terms:

  • Frequency: same meaning as in the analog world
  • Repeater: same meaning as in the analog world
  • Timeslot: Each frequency is split into two timeslots (1 and 2) and what that means that there can be two simultaneous transmissions on each frequency.
  • Color code: This is the digital equivalent of a CTCSS tone (sometimes called privacy tone) in that using the incorrect code means that you will tie up one of the timeslots on the frequency, but nobody else will hear you. These are not actually named after colors, but are instead just numerical IDs from 0 to 15.

There are two different identification mechanisms (both are required):

  • Callsign: This is the same identifier issued to you by your country's amateur radio authority. Mine is VA7GPL.
  • Radio ID: This is a unique numerical ID tied to your callsign which you must register for ahead of time and program into your radio. Mine is 3027260.

The following is where this digital mode becomes most interesting:

  • Talkgroup: a "chat room" where everything you say will be heard by anybody listening to that talkgroup
  • Network: a group of repeaters connected together over the Internet (typically) and sharing a common list of talkgroups
  • Hotspot: a personal simplex device which allows you to connect to a network with your handheld and access all of the talkgroups available on that network

The most active network these days is Brandmeister, but there are several others.

  • Access: This can either be Always on which means that a talkgroup will be permanently broadcasting on a timeslot and frequency, or PTT which means a talkgroup will not be broadcast until it is first "woken up" by pressing the push-to-talk button and then will broadcast for a certain amount of time before going to sleep again.
  • Channel: As in the analog world, this is what you select on your radio when you want to talk to a group of people. In the digital world however, it is tied not only to a frequency (and timeslot) and tone (color code), but also to a specific talkgroup.

Ultimately what you want to do when you program your radio is to find the talkgroups you are interested in (from the list offered by your local repeater) and then assign them to specific channel numbers on your radio. More on that later.

Callsign and Radio IDs

Before we get to talkgroups, let's set your callsign and Radio ID:

Then you need to download the latest list of Radio IDs so that your radio can display people's names and callsigns instead of just their numerical IDs.

One approach is to only download the list of users who recently talked on talkgroups you are interested in. For example, I used to download the contacts for the following talkgroups: 91,93,95,913,937,3026,3027,302,30271,30272,530,5301,5302,5303,5304,3100,3153,31330 but these days, what I normally do is to just download the entire worldwide database (user.csv) since my radio still has enough storage (200k entries) for it.

In order for the user.csv file to work with the AnyTone CPS, it needs to have particular columns and use the DOS end-of-line characters (apt install dos2unix if you want to do it manually). I wrote a script to do all of the work for me.

If you use dmrconfig to program this radio instead, then the conversion is unnecessary. The user.csv file can be used directly, however it will be truncated due to an incorrect limit hard-coded in the software.

Talkgroups

Next, you need to pick the talkgroups you would like to allocate to specific channels on your radio.

Start by looking at the documentation for your local repeaters (e.g. VE7RAG and VE7NWR in the Vancouver area).

In addition to telling you the listen and transmit frequencies of the repeater (again, this works the same way as with analog FM), these will tell you which talkgroups are available and what timeslots and color codes they have been set to. It will also tell you the type of access for each of these talkgroups.

This is how I programmed a channel:

and a talkgroup on the VE7RAG repeater in my radio:

If you don't have a local repeater with DMR capability, or if you want to access talkgroups available on a different network, then you will need to get a DMR hotspot such as one that's compatible with the Pi-Star software.

This is an excerpt from the programming I created for the talkgroups I made available through my hotspot:

One of the unfortunate limitations of the CPS software for the AnyTone 878 is that talkgroup numbers are globally unique identifiers. This means that if TG1234 (hypothetical example) is Ragchew 3000 on DMR-MARC but Iceland-wide chat on Brandmeister, then you can't have two copies of it with different names. The solution I found for this was to give that talkgroup the name "TG1234" instead of "Ragchew3k" or "Iceland". I use a more memorable name for non-conflicting talkgroups, but for the problematic ones, I simply repeat the talkgroup number.

Simplex

Talkgroups are not required to operate on DMR. Just like analog FM, you can talk to another person point-to-point using a simplex channel.

The convention for all simplex channels is the following:

  • Talkgroup: 99
  • Color code: 1
  • Timeslot: 1
  • Admit criteria: Always
  • In Call Criteria: TX or Always

After talking to the British Columbia Amateur Radio Coordination Council, I found that the following frequency ranges are most suitable for DMR simplex:

  • 145.710-145.790 MHz (simplex digital transmissions)
  • 446.000-446.975 MHz (all simplex modes)

The VECTOR list identifies two frequencies in particular:

  • 446.075 MHz
  • 446.500 MHz

Learn more

If you'd like to learn more about DMR, I would suggest you start with this excellent guide (also mirrored here).

Francois MarierList of Planet Linux Australia blogs

I've been following Planet Linux Australia for many years and discovered many interesting FOSS blogs through it. I was sad to see that it got shut down a few weeks ago and so I decided to manually add all of the feeds to my RSS reader to avoid missing posts from people I have been indirectly following for years.

Since all feeds have been removed from the site, I recovered the list of blogs available from an old copy of the site preserved by the Internet Archive.

Here is the resulting .opml file if you'd like to subscribe.

Changes

Once I had the full list, I removed all blogs that are gone, empty or broken (e.g. domain not resolving, returning a 404, various database or server errors).

I updated the URLs of a few blogs which had moved but hadn't updated their feeds on the planet. I also updated the name of a blogger who was still listed under a previous last name.

Finally, I removed LA-specific tags from feeds since these are unlikely to be used again.

Work-arounds

The following LiveJournal feeds didn't work in my RSS reader but opened fine in a browser:

However since none of them have them updated in the last 7 years, I just left them out.

A couple appear to be impossible to fetch over Tor, presumably due to a Cloudflare setting:

Since only the last two have been updated in the last 9 years, I added these to Feedburner and added the following "proxied" URLs to my reader:

Similarly, I couldn't fetch the following over Tor for some other reasons:

I excluded the first two which haven't been updated in 6 years and proxied the other ones:

,

Tim RileyOpen source status update, May 2021

Well didn’t May go by quickly! Here’s what I got up to in OSS over the month.

For starters, do you remember the big Hanami 2.0.0.alpha2 I mentioned last month? Yep, that happened. And it was a little cheeky of me to sneak it into my March/April update, because it happened into the first week of May!

So after a short while recuperating from that big push, there wasn’t a whole lotta time left in the month! And apart from this, it was kind of a funny month, because it brought a different kind of work to the sort I’d been doing for a while.

Welcoming Marc Busqué to the team!

The real highlight of the month was Marc Busqué getting on board with Hanami development! Marc’s brought a huge amount of fresh energy to the team and got right into productive development work. It’s great be to working with you, Marc!

Preparing dry-configurable’s API for 1.0

One of the first things Marc did was make some adjustments to dry-configurable’s setting API, to make it more consistent as one of the final steps before we can release 1.0 of that gem.

setting will now take only a single positional argument, for the name of the setting. Everything else must be provided via keyword arguments for improved consistency and clarity, plus easier wrapping by other gems. This means:

  • The default value for the setting must be supplied as default: rather than a second positional argument
  • A setting’s constructor (or â€�processorâ€�) can no longer be supplied as a block, instead it should be a proc object passed to constructor:

We merged these in these PRs, which include a deprecation pathway, so the previous usage will (largely) continue to work. While we were in dry-configurable, I also made a fix for it to work with preexisting #initialize methods accepting keyword args when its module is included, as well as removing implicit hash conversion which can result in unexpected destructuring when passing a configurable object to a method accepting a keyword args splat.

These dry-configurable changes haven’t yet been released, but hopefully we can make it happen sometime in June. The reason is that they were in service of a couple of larger efforts, both of which are still in flight (read on below for more detail!).

In the meantime, after these API changes, I did a sweep of the dry-rb ecosystem to bring things up to date, which led to PRs in dry-system, dry-container, dry-effects, dry-monitor, dry-rails, dry-schema, dry-validation, and hanami-view. Phew! It just goes to show how load-bearing this little gem is for our overall ecosystem (and how wide-ranging the impact of API changes can be). I haven’t merged these yet either, but will hope to do so in the next week or so, once we’ve ensured we have compatibility with both current and future dry-configurable APIs for the range of dry-rb gems that are past their respective 1.0 releases.

Porting Hanami::Configuration to dry-configurable

One thing that led to a couple of those dry-configurable fixes was my work in updating Hanami::Configuration to use dry-configurable. This class had gotten pretty sprawling with its manual handling of reading/writing a wide range setting values, which is squarely in dry-configurable’s wheelhouse, and the result is much tidier (and now consistent with how we’re handling configuration in both hanami-controller and hanami-view). This one again isn’t quite ready to merge (are you sensing a theme?), but it’s probably just an hour away from being done. I’ll look forward to having this one ticked off!

Updating Hanami’s application settings to use dry-configurable (and more)

Hanami’s application settings (the ones you define for yourself in config/settings.rb) have been very dry-configurable-like since their inception, but backed by custom code instead. It’s been on my to-do list for a long time to switch this over to dry-configurable, but with Marc joining the team, we’ve finally got some traction here! You can the original PR and an current, in-progress PR as well.

This one was a lot of collaborative fun. Marc made the broad initial steps, I jumped in to poke around and explore the design possibilities, and then he took my direction and ran with it, adding some other nice improvements along the way, like introducing a �settings store� abstraction, which in our default implementation will continue to rely on dotenv.

This one is also close to being done. Watch this space (and all the other spaces I’ve mentioned so far, if you’re keen).

Providing a default types module to application settings (or not), and probably turning the whole thing into a regular class

Marc pivoted quickly from the above work to another long-standing to-do of ours: making a types module automatically available to the application settings. Having type-safe settings is one of the nicest features of the way we’re handling them, and I’d like this to be as smooth as possible for our users!

This turned out to be a bit of a rabbit hole, as evidenced by this sprawling PR discussion, but I think it’s led us to a good place.

Currently, the application settings must be defined in a block provided to the Hanami::Application.settings:

Hanami.application.settings do
  setting :sentry_dsn
end

Due to the combination of Ruby’s use of the lexical scope for constant lookups within blocks and dry-types’ standard reliance upon types collections as modules, with custom types defined as constants, it was nigh on impossible to auto-generate and provide an ergonomic, idiomatic types module for use within a block like that (see the linked PR discussion for details).

So this led us to the decision to move the application settings definition to a good ol’ ordinary Ruby class:

module MyApp
  class Settings < Hanami::Application::Settings
    setting :sentry_dsn
  end
end

This will still be looked up and loaded by the framework automatically, but because we’re using a regular class, we can rely on all the regular Ruby techniques for referring to a types module. This means we could choose to access a types module that the user has already created for themselves, e.g. MyApp::Types:

require "my_app/types"

module MyApp
  class Settings < Hanami::Application::Settings
    setting :sentry_dsn, MyApp::Types::String
  end
end

Or even create our own localised types module right within the class:

require "dry/types"

module MyApp
  Types = Dry.Types

  class Settings < Hanami::Application::Settings
    setting :sentry_dsn, Types::String
  end
end

This is much simpler and less likely to confuse! Better still, because we have a regular class at our disposal, users can now add their own custom behavior to their settings:

require "dry/types"

module MyApp
  Types = Dry.Types

  class Settings < Hanami::Application::Settings
    setting :sentry_dsn, Types::String.optional

    def sentry_enabled?
      !sentry_dsn.nil?
    end
  end
end

So I think this is a positive direction to be heading in. Plus, it reinforces the Hanami philosophy of �a place for everything and everything in it’s place,� with this settings class being a great exemplar of a single-responsibility class, even for something that’s a special part of the framework boot process.

Plans for June

Well, I think that about brings you all up to speed for now. My plan for the rest of June is to make sure I can help merge all of those PRs! And then I’ll be getting back into Zeitwerk-land hand looking for ways to simplify the Ruby source file structures that we have inside our application and slice directories.

Thank you to my sponsors (including NEW SPONSORS!!) ��

May turned out to be hugely encouraging month for my GitHub sponsorships!

Thank you to Jason Charnes for upgrading your sponsorship! And thank you to Janko Marohnić and Aldis Berjoza for beginning new sponsorships! 🥰 Thanks also to Sebastian Wilgosz who began a periodic sponsorship based on a portion of his Hanami Mastery project sponsorships.

Little things like this this really do mean a lot, so folks, thanks again! ��

If you’d like to support my ongoing OSS work, I’d love it if you could join my cadre of intelligent and very good looking sponsors on GitHub. And as ever, thank you to my existing sponsors for your ongoing support!

See you next month!

,

Russell CokerDell PowerEdge T320 and Linux

I recently bought a couple of PowerEdge T320 servers, so now to learn about setting them up. They are a little newer than the R710 I recently setup (which had iDRAC version 6), they have iDRAC version 7.

RAM Speed

One system has a E5-2440 CPU with 2*16G DDR3 DIMMs and a Memtest86+ speed of 13,043MB/s, the other is essentially identical but with a E5-2430 CPU and 4*16G DDR3 DIMMs and a Memtest86+ speed of 8,270MB/s. I had expected that more DIMMs means better RAM performance but this isn’t what happened. I firstly upgraded the BIOS, as I expected it didn’t make a difference but it’s a good thing to try first.

On the E5-2430 I tried removing a DIMM after it was pointed out on Facebook that the CPU has 3 memory channels (here’s a link to a great site with information on that CPU and many others [1]). When I did that I was prompted to disable advanced ECC (which treats pairs of DIMMs as a single unit for ECC allowing correcting more than 1 bit errors) and I had to move the 3 remaining DIMMS to different slots. That improved the performance to 13,497MB/s. I then put the spare DIMM into the E5-2440 system and the performance increased to 13,793MB/s, when I installed 4 DIMMs in the E5-2440 system the performance remained at 13,793MB/s and the E5-2430 went down to 12,643MB/s.

This is a good result for me, I now have the most RAM and fastest RAM configuration in the system with the fastest CPU. I’ll sell the other one to someone who doesn’t need so much RAM or performance (it will be really good for a small office mail server and NAS).

Firmware Update

BIOS

The first issue is updating the BIOS, unfortunately the first link I found to the Dell web site didn’t have a link to download the Linux installer. It offered a Windows binary, an EFI program, and a DOS binary. I’m not about to install Windows if there is any other option and EFI is somewhat annoying, so that leaves DOS. The first Google result for installing FreeDOS advised using “unetbootin”, that didn’t work at all for me (created a USB image that the Dell BIOS didn’t recognise as bootable) and even if it did it wouldn’t have been a good solution.

I went to the FreeDOS download page [2] and got the “Lite USB” zip file. That contained “FD12LITE.img” which I could just dd to a USB stick. I then used fdisk to create a second 32MB partition, used mkfs.fat to format it, and then copied the BIOS image file to it. I booted the USB stick and then ran the BIOS update program from drive D:. After the BIOS update this became the first system I’ve seen get a totally green result from “spectre-meltdown-checker“!

I found the link to the Linux installer for the new Dell BIOS afterwards, but it was still good to play with FreeDOS.

PERC Driver

I probably didn’t really need to update the PERC (PowerEdge Raid Controller) firmware as I’m just going to run it in JBOD mode. But it was easy to do, a simple bash shell script to update it.

Here are the perccli commands needed to access disks, it’s all hot-plug so you can insert disks and do all this without a reboot:

# show overview
perccli show
# show controller 0 details
perccli /c0 show all
# show controller 0 info with less detail
perccli /c0 show
# clear all "foreign" RAID members
perccli /c0 /fall delete
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

The “perccli /c0 show” command gives the following summary of disk (“PD” in perccli terminology) information amongst other information. The EID is the enclosure, Slt is the “slot” (IE the bay you plug the disk into) and the DID is the disk identifier (not sure what happens if you have multiple enclosures). The allocation of device names (sda, sdb, etc) will be in order of EID:Slt or DID at boot time, and any drives added at run time will get the next letters available.

----------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                     Sp 
----------------------------------------------------------------------------------
32:0      0 Onln   0  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:1      1 Onln   1  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:3      3 Onln   2   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
32:4      4 Onln   3   3.637 TB SATA HDD N   N  512B WDC WD40EURX-64WRWY0      U  
32:5      5 Onln   5 278.875 GB SAS  HDD Y   N  512B ST300MM0026               U  
32:6      6 Onln   6 558.375 GB SAS  HDD N   N  512B AL13SXL600N               U  
32:7      7 Onln   4   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
----------------------------------------------------------------------------------

The PERC controller is a MegaRAID with possibly some minor changes, there are reports of Linux MegaRAID management utilities working on it for similar functionality to perccli. The version of MegaRAID utilities I tried didn’t work on my PERC hardware. The smartctl utility works on those disks if you tell it you have a MegaRAID controller (so obviously there’s enough similarity that some MegaRAID utilities will work). Here are example smartctl commands for the first and last disks on my system. Note that the disk device node doesn’t matter as all device nodes associated with the PERC/MegaRAID are equal for smartctl.

# get model number etc on DID 0 (Samsung SSD)
smartctl -d megaraid,0 -i /dev/sda
# get all the basic information on DID 0
smartctl -d megaraid,0 -a /dev/sda
# get model number etc on DID 7 (Seagate 4TB disk)
smartctl -d megaraid,7 -i /dev/sda
# exactly the same output as the previous command
smartctl -d megaraid,7 -i /dev/sdc

I have uploaded etbemon version 1.3.5-6 to Debian which has support for monitoring smartctl status of MegaRAID devices and NVMe devices.

IDRAC

To update IDRAC on Linux there’s a bash script with the firmware in the same file (binary stuff at the end of a shell script). To make things a little more exciting the script insists that rpm be available (running “apt install rpm” fixes that for a Debian system). It also creates and runs other shell scripts which start with “#!/bin/sh” but depend on bash syntax. So I had to make /bin/sh a symlink to /bin/bash. You know you need this if you see errors like “typeset: not found” and “[: -eq: unexpected operator” and then the system reboots. Dell people, please test your scripts on dash (the Debian /bin/sh) or just specify #!/bin/bash.

If the IDRAC update works it will take about 8 minutes.

Lifecycle Controller

The Lifecycle Controller is apparently for installing OS and firmware updates. I use Linux tools to update Linux and I generally don’t plan to update the firmware after deployment (although I could do so from Linux if needed). So it doesn’t seem to offer anything useful to me.

Setting Up IDRAC

For extra excitement I decided to try to setup IDRAC from the Linux command-line. To install the RAC setup tool you run “apt install srvadmin-idracadm7 libargtable2-0” (because srvadmin-idracadm7 doesn’t have the right dependencies).

# srvadmin-idracadm7 is missing a dependency
apt install srvadmin-idracadm7 libargtable2-0
# set the IP address, netmask, and gatewat for IDRAC
idracadm7 setniccfg -s 192.168.0.2 255.255.255.0 192.168.0.1
# put my name on the front panel LCD
idracadm7 set System.LCD.UserDefinedString "Russell Coker"

Conclusion

This is a very nice deskside workstation/server. It’s extremely quiet with hardly any fan noise and the case is strong enough to contain the noise of hard drives. When running with 3* 3.5″ SATA disks and 2*10k 2.5″ SAS disks on a wooden floor it wasn’t annoyingly loud. Without the SAS disks it was as quiet as you can expect any PC to be, definitely not the volume you expect from a serious server! I bought the T320 systems loaded with SAS disks which made them quite loud, I immediately put the disks on ebay and installed SATA SSDs and hard drives which gives me more performance and more space than the SAS disks with less cost and almost no noise.

8*3.5″ drive bays gives room for expansion. I currently have 2*SATA SSDs and 3*SATA disks, the SSDs are for the root filesystem (including /home) and the disks are for a separate filesystem for large files.

,

Russell CokerNetflix and IPv6

It seems that Netflix has an ongoing issue of not working well with IPv6, apparently they have some sort of region checking code that doesn’t correctly identify IPv6 prefixes. To fix this I wrote the following script to make a small zone file with only A records for Netflix and no AAAA records. The $OUT.header file just has the SOA record for my fake netflix.com domain.

#!/bin/bash

OUT=/etc/bind/data/netflix.com
HEAD=$OUT.header

cp $HEAD $OUT
dig -t a www.netflix.com @8.8.8.8|sed -n -e "s/^.*IN/www IN/p"|grep [0-9]$ >> $OUT
dig -t a android.prod.cloud.netflix.com @8.8.8.8|sed -n -e "s/^.*IN/android.prod.cloud IN/p"|grep [0-9]$ >> $OUT
/usr/sbin/rndc reload > /dev/null

Update

I updated this post to add a line for android.prod.cloud.netflix.com which is the address used by Android devices.

,

Simon LyallAudiobook Reviews – May 2021

Alexander the Great: His Life and His Mysterious Death by Anthony Everitt

A fairly straight biography except for some early chapters setting the scene. Keep things interesting most of the time. 3/5

Grit: The Power of Passion and Perseverance by Angela Duckworth

“The secret to outstanding achievement is not talent, but a passionate persistence. In other words, grit.” . Usual pop-psych with the usual good stories 3/5

100 Side Hustles: Unexpected Ideas for Making Extra Money Without Quitting Your Day Job by Chris Guillebeau

100 small businesses and their story. With lessons learnt from each and some themes. Told with lots of puns. 4/5

How The Internet Happened: From Netscape to the iPhone by Brian McCullough

Covering the big internet events and companies between 1993 and 2008. Mosaic, AOL, Ebay, Amazon, Yahoo, Napster and ending with the Ipod. Lots of good stories some new angles. 4/5

Diamonds are Forever by Ian Fleming

James Bond infuriates a Diamond smuggling operation run by American Gangsters. Lots of period detail and violent set pieces. 4/5



Benjamin Franklin: An American Life by Walter Isaacson

A nice short biography that attempts to highlight neglected areas such as Franklin’s family and friends his scientific work. Fun without missing too much detail. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Russell CokerInternode NBN with Arris CM8200 on Debian

I’ve recently signed up for Internode NBN while using the Arris CM8200 device supplied by Optus (previously used for a regular phone service). I took the configuration mostly from Dean’s great blog post on the topic [1]. One thing I changed was the /etc/networ/interfaces configuration, I used the following:

# VLAN ID 2 for Internode's NBN HFC.
auto eth1.2
iface eth1.2 inet manual
  vlan-raw-device eth1

auto nbn
iface nbn inet ppp
    pre-up /bin/ip link set eth1.2 up
    provider nbn

There is no need to have a section for eth1 when you have a section for eth1.2.

IPv6

IPv6 for only one system

With a line in /etc/ppp/options containing only “ipv6 ,” you get an IPv6 address automatically for the ppp0 interface after starting pppd.

IPv6 for your lan

Internode has documented how to configure the WIDE DHCPv6 client to get an IPv6 “prefix” (subnet) [2]. Just install the wide-dhcpv6-client package and put your interface names in a copy of the Internode example config and that works. That gets you a /64 assigned to your local Ethernet. Here’s an example of /etc/wide-dhcpv6/dhcp6c.conf:

interface ppp0 {
    send ia-pd 0;
    script "/etc/wide-dhcpv6/dhcp6c-script";
};

id-assoc pd {
    prefix-interface br0 {
        sla-id 0;
        sla-len 8;
    };
};

For providing addresses to other systems on your LAN they recommend radvd version 1.1 or greater, Debian/Bullseye will ship with version 2.18. Here is an example /etc/radvd.conf that will work with it. It seems that you have to manually (or with a script) set the value to use in place of “xxxx:xxxx:xxxx:xxxx” from the value that is assigned to eth0 (or whichever interface you are using) by the wide-dhcpv6-client.

interface eth0 { 
        AdvSendAdvert on;
        MinRtrAdvInterval 3; 
        MaxRtrAdvInterval 10;
        prefix xxxx:xxxx:xxxx:xxxx::/64 { 
                AdvOnLink on; 
                AdvAutonomous on; 
                AdvRouterAddr on; 
        };
};

Either the configuration of the wide dhcp client or radvd removes the default route from ppp0, so you need to run a command like
ip -6 route add default dev ppp0” to put it back. Probably having “ipv6 ,” is the wrong thing to do when using wide-dhcp-client and radvd.

On a client machine with bridging I needed to have “net.ipv6.conf.br0.accept_ra=2” in /etc/sysctl.conf to allow it to accept route advisory messages on the interface (in this case eth0), for machines without bridging I didn’t need that.

Firewalling

The default model for firewalling nowadays seems to be using NAT and only configuring specific ports to be forwarded to machines on the LAN. With IPv6 on the LAN every system can directly communicate with the rest of the world which may be a bad thing. The following lines in a firewall script will drop all inbound packets that aren’t in response to packets that are sent out. This will give an equivalent result to the NAT firewall people are used to and you can always add more rules to allow specific ports in.

ip6tables -A FORWARD -i ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT
ip6tables -A FORWARD -i ppp+ -i DROP

Lev LafayetteInaugural John Lions Distinguished Lectures, University of New South Wales

The importance of John Lions to computing history has spanned decades and continues to do so. In 1976 he published, through the University of New South Wales, the Lions' Commentary on UNIX 6th Edition, with Source Code. The book was both explanatory for the UNIX kernel, but also as a teaching tool. The content was extraordinarily well-written, explaining difficult concepts with a remarkable coherence and also engaging in an early form of instructional scaffolding. When AT&T released UNIX v7 they specifically prohibited "classroom use" of the content, leading to thousands of educators, engineers, and learners around the world photocopying the "Lion's Book", making it the most illegally copied book in computer science history. It wasn't until 1996 when the Santa Cruz Operation, the new owners of UNIX, allowed for its legal publication. Of course, Lions was an academic (associate professor) and involved the community as the founding president of the Australian UNIX User's Group.

With this in mind, the University of New South Wales organised an extraordinary one-day conference with some of the most impressive figures in global IT infrastructure for the last fifty years. UNIX co-creator (and B language inventor, and Go language co-creator) Ken Thompson started the day with Scientia Professor Gernot Heiser. A former student of John Lions, and chair of the organising committee, John O'Brien contributed with the always incredible Brian Kernighan in a thoroughly charming and clear description of the design principles behind UNIX. Kernighan, always able to enunciate these features, noted the great importance of operators such as pipes and redirection statements and the use of regular expressions. I could not help but feel a small sense of gratification knowing how much I emphasise these features in my own teaching. It is difficult to imagine what high performance computing would be like without such components.

If there was an area that was somewhat unknown to me was the content-rich material by Margo Seltzer one of the original developers of BerkeleyDB, and she explored some of the conflicts that existed between databases and operating systems and how UNIX's development worked well with BerkeleyDB. Rob Pike, co-creator the Plan 9 operating system, co-author of the UNIX Programming Environment, and co-creator of the Go language, provided an inspiring overview of the development and recent innovations of the latter, while Andrew Tridgell, creator of Samba and rsync, gave his take on the development of FOSS over the years - and with a particular illustration of the "French cafe" method of learning proprietary protocols.

It was not all deep-tech however, and the supporting processes are also a necessary (and often understated) part of the development of our major IT technologies. Butler Lampson spoke in a careful, detailed, and structured fashion on the design of computer systems and fielded a question on the relevant benefits of FOSS systems quite well. Elizabeth Churchill provided a very handy overview of how the emphasis on user experience over the years has a concurrent meeting for developer-experience, especially using Fuschia and Flutter, and current president of Linx Australia, Sae Ra Germaine, gave some handy advice on the management of communities, especially including the less salubrious parts.

The final two presentations, by Gernot Heiser and Andy Tanenbaum, were certainly capstones on the formal proceedings of the day. Gernot gave a potted history and rather impressive current deployments of the sel4 microkernel which provides security and performance on the kernel layer. I am especially interested in its RISC-V implementation. The final presentation of the day was by Andy Tanenbaum, creator of Minix, and author of two of the most well-known books used in computer science education on Computer Networks and Operating Systems. Apart from raising the classic debate between himself and Linus Torvalds over the relative virtue of micro vs monolithic kernels. Tannenbaum did make the very important point that the Lions Book led directly to MINIX, which led to Linux, which lead to Android (and to which I will add with BSD UNIX and MacOS X also following having a lineage to Lions). In other words, almost everything that we really know about computing today has been profoundly influenced by John Lions.

Finishing the day were announcements by Heiser for the establishment of the UNSW Centre for Critical Digital Infrastructure and a significant John Lions prize for Open Source aligned with the Centre. There were also final words by the UNSW Vice-Chancellor Ian Jacobs before we departed to an evening re-dedication of the John Lions Garden. It was during that time I engaged in conversation with his brother, his wife Marianne (who gave a charming speech), and his two daughters, both of who were impressed but I suspect a little mystified on the importance of their late father (Pixel, the family dog was quite a character as well). I must also mention, courtesy of spending of much of the day in his delightful company, the contributions of one John Wulff, and especially his basic assembler language, balad. I look forward to further correspondence with this very experienced engineer from a different era.

Paul WiseFLOSS Activities May 2021

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian wiki: unblock IP addresses, approve accounts

Communication

  • Joined the great IRC migration
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

The purple-discord, sptag and esprima-python work was sponsored by my employer. All other work was done on a volunteer basis.

,

Russell CokerSome Ideas About Storage Reliability

Hard Drive Brands

When people ask for advice about what storage to use they often get answers like “use brand X, it works well for me and brand Y had a heap of returns a few years ago”. I’m not convinced there is any difference between the small number of manufacturers that are still in business.

One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won’t spread it’s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk).

Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can’t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don’t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn’t mean the next series of disks from the same manufacturer will have problems.

The issues with SSDs are similar to the issues with hard drives but a little different. I’m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don’t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk.

Hard Drive vs SSD

A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a “SSHD” or “Hybrid Drive” gets all the potential failures of hard drives and SSDs.

One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don’t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn’t production ready last time I tested it).

Some people are concerned that their filesystem won’t support “wear levelling” for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn’t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device.

Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD [1] (conclusion – it won’t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about “Wear_Leveling_Count” (old age).

Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays.

My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren’t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably.

NVMe

I think that NVMe isn’t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology.

Future Technology

The latest thing for high end servers is Optane Persistent memory [2] also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is “Memory Mode” where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of “RAM” not ridiculously expensive. Another way of using it is “App Direct Mode” where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is “Mixed Memory Mode” which has some DCPMM in “Memory Mode” and some in “App Direct Mode”.

This has much potential for use of backups and to make things extra exciting “App Direct Mode” has RAID-0 but no other form of RAID.

Conclusion

I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data.

Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode.

Backups are a good thing.

Dave HallA Rube Goldberg Machine for Container Workflows

Learn how can you securely copy container images from GHCR to ECR.

,

Russell CokerWifi Performance on Linux

Wifi usually just works. In the past I haven’t had to worry much about performance as for home use things have always been bearable and at work it’s never been my job so I just file a bug report with the relevant people when things go wrong. But a few years ago I had some problems.

For my home network I got a free Wifi AP which wasn’t performing well.

My AP supported 802.11 modes b/g or g/n (b, g, and n are slow, medium, and fast speeds). I initially had the AP running in b/g mode because I had an 802.11b USB wifi device that I used. When I replaced that with one that did 802.11g I tried changing the AP to g/n mode but performance was even worse on my laptop (although quite good on phones) so I switched back.

For phones it appeared to work well giving 54Mb/s while on my laptop (a second hand Thinkpad X1 Carbon) it was giving 11Mb/s at best and often much less than that. The best demonstration of problems was to start transferring a large file while pinging a system on the LAN the AP was connected to. Usually it would give ping times of 1s or more, sometimes 5s+ ping times. While this was happening the “Invalid misc” count increased rapidly, often by more than 100 per second.

The results of Google searches suggest that “Invalid misc” is due to interference and recommend changing the channel. My AP had been on channel 1 which had performed poorly, channels 2-8 were ok, and channel 9 seemed reasonably good. As an aside trying all channels manually is not a good idea, it takes a lot of time and gives little useful data. After changing to channel 9 it still only gave about 500KB/s when transferring large files with ping times of about 100ms, but that’s a big improvement. I tried running “iwlist scanning” to scan the Wifi network for other APs, that showed that channel 1 was used a lot but didn’t make it clear what I should do other than that.

The next thing I tried was the Wifi Analyser app on Android [1] (which doesn’t work on my latest phone, I don’t know if it’s still being actively maintained, it will definitely work on older phones). That has a nice graph mode that shows which channels are used and how the frequencies spread and interfere with other channels. One thing I hadn’t realised before I looked at the graphs is that 802.11n uses 4 channels and interferes past that. If you have two 802.11n devices you don’t have much space left out of the 14 channels available. To make more space I configured the Wifi AP in my ADSL modem to 802.11b/g mode and assigned it a channel away from the others making 4 channels available with no interference.

After that iwconfig reported between 60 and 120Mb/s and I got consistent transfer rates over 1.5MB/s while ping times remained below 100ms.

The 5GHz frequency range is less congested. But at the time I didn’t feel like buying 5GHz equipment.

Since that time I had signed up with an ISP that had a good deal on a Wifi AP that had 5GHz. Now I have all my devices configured to use 5GHz or 2.4GHz depending on which they think is best. So there’s less devices on 2.4GHz and the AP is configured for “20MHz channel width” in the 2.4GHz range (which means 802.11b/g).

Conclusion

802.11n seems to be a bad idea unless you run the only AP in an area. In a suburban area you will have 3 other houses broadcasting in your area and 802.11n is bad for everyone. The worst case scenario would be one person using 802.11n and interfering with everyone else’s 802.11g and then having everyone else turn on 802.11n to try and make things faster.

5GHz is less congested as most people run old hardware. It also has a shorter range which has the upside of getting less interference from other people. I’m considering installing 5GHz APs at both ends of my house and configuring all my new devices to not use 2.4GHz.

Wifi spectrum analysis software is much better than manual testing of channels or trying to deduce things from the output if “iwlist scanning“.

Russell CokerUSB Cables and Cameras

This page has summaries of some USB limits [1]. USB 2.0 has the longest cable segment limit of 5M (1.x, 3.x, and USB-C are all shorter), so USB 2.0 is what you want for long runs. The USB limit for daisy chained devices is 7 (including host and device), so that means a maximum of 5 hubs or a total distance between PC and device of 30M. There are lots of other ways of getting longer distances, the cheapest seems to be putting an old PC at the far end of an Ethernet cable.

Some (many? most?) laptops have USB for the interface to the built in camera, and these are sold from scrapped laptops. You could probably setup a home monitoring system in a typical home by having a centrally located PC with USB hubs fanning out to the corners. But old Android phones on a Wifi network seems like an easier option if you can prevent the phones from crashing all the time.

Russell CokerHP ML110 Gen9

I’ve just bought a HP ML110 Gen9 as a personal workstation, here are my notes about it and documentation on running Debian on it.

Why a Server?

I bought this is because the ML350p Gen8 turned out to be too noisy for my taste [1]. I’ve just been editing my page about Memtest86+ RAM speeds [2], over the course of 10 years (high end laptop in 2001 to low end server in 2011) RAM speed increased by a factor of 100. RAM speed has been increasing at a lower rate than CPU speed and is becoming an increasing bottleneck on system performance. So while I could get a faster white-box system the cost of a second-hand server isn’t that great and I’m getting a system that’s 100* faster than what was adequate for most tasks in 2001.

HP makes some nice workstation class machines with ECC RAM (think server without remote management, hot-swap disks, or redundant PSU but with sound hardware). But they are significantly more expensive on the second hand market than servers.

This server cost me $650 and came with 2*480G “DC” grade SSDs (Intel but with HPE stickers). I hope that more than half of the purchase price will be recovered from selling the SSDs (I will use NVMe). Also 64G of non-ECC RAM costs $370 from my local store. As I want lots of RAM for testing software on VMs it will probably turn out that the server cost me less than the cost of new RAM once I’ve sold the SSDs!

Monitoring

wget -O /usr/local/hpePublicKey2048_key1.pub https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub
echo "# HP monitoring" >> /etc/apt/sources.list
echo "deb [signed-by=/usr/local/hpePublicKey2048_key1.pub] http://downloads.linux.hpe.com/SDR/downloads/MCP/Debian/ stretch/current-gen9 non-free" >> /etc/apt/sources.list

The above commands will make the management utilities installable on Debian/Buster. If using Bullseye (Testing at the moment) then you need to have Buster repositories in APT for dependencies, HP doesn’t seem to have packaged all their utilities for Buster.

wget -r -np -A Contents-amd64.bz2 http://downloads.linux.hpe.com/SDR/repo/mcp/debian/dists

To find out which repositories had the programs I need I ran the above recursive wget and then uncompressed them for grep -R (as an aside it would be nice if bzgrep supported -R). I installed the hp-health package which has hpasmcli for viewing and setting many configuration options and hplog for viewing event log data and thermal data (among a few other things). I’ve added a new monitor to etbemon hp-temp.monitor to monitor HP server temperatures, I haven’t made a configuration option to change the thresholds for what is considered “normal” because I don’t expect server class systems to be routinely running above the warning temperature. For the linux-temp.monitor script I added a command-line option for the percentage of the “high” temperature that is an error condition as well as an option for the number of CPU cores that need to be over-temperature, having one core permanently over the “high” temperature due to a web browser seems standard for white-box workstations nowadays.

The hp-health package depends on “libc6-i686 | lib32gcc1” even though none of the programs it contains use lib32gcc1. Depending on lib32gcc1 instead of “lib32gcc1 | lib32gcc-s1” means that installing hp-health requires removing mesa-opencl-icd which probably means that BOINC can’t use the GPU among other things. I solved this by editing /var/lib/dpkg/status and changing the package dependencies to what I desired. Note that this is not something for a novice to do, make a backup and make sure you know what you are doing!

Issues

The “HPE Dynamic Smart Array B140i” is a software RAID device. While it’s convenient for some users that software RAID gets supported in the UEFI boot process, generally software RAID is a bad idea. Also my system has hot-swap drive caddies but the controller doesn’t support hot-swap. So the first thing to do was to configure the array controller to run in AHCI mode and give up on using hot-swap drive caddies for hot-swap. I tested all the documented ways of scanning for new devices and nothing other than a reboot made the kernel recognise a new SATA disk.

According to specs provided by Dell and HP the ML110 Gen9 makes less noise than the PowerEdge T320, according to my own observations the reverse is the case. I don’t know if this is because of Dell being more conservative in their specs than HP or because of how dBA is measured vs my own personal annoyance thresholds for sounds. As the system makes more noise than I’m comfortable with I plan to build a rubber enclosure for the rear of the system to reduce noise, that will be the subject of another post. For Australian readers Bunnings has some good deals on rubber floor mats that can be used to reduce server noise.

The server doesn’t have sound hardware, while one could argue that servers don’t need sound there are some server uses for sound hardware such as using line input as a source of entropy. Also for a manufacturer it might be a benefit to use the same motherboard for workstations and servers. Fortunately a friend gave me a nice set of Logitech USB speakers a few years ago that I hadn’t previously had a cause to use, so that will solve the problem for me (I don’t need line-in on a workstation).

UEFI and Memtest

I decided to try UEFI boot for something new (in the past I’d only used UEFI boot for a server that only had large disks). In the past I’ve booted all my own systems with BIOS boot because I’m familiar with it and they all have SSDs for booting which are less than 2TB in size (until recently 2TB SSDs weren’t affordable for my personal use). The Debian UEFI wiki page is worth reading [3]. The Debian Wiki page about ProLiant servers [4] is worth reading too.

Memtest86+ doesn’t support EFI booting (just goes to a black screen) even though Debian/Buster puts in a GRUB entry for it (Debian bug #695246 was filed for this in 2012). Also on my ML110 Memtest86+ doesn’t report the RAM speed (a known issue on Memtest86+). Comments on the net say that Memtest86+ hasn’t been maintained for a long time and Memtest86 (the non-free version) has been updated more recently. So far I haven’t seen a system with ECC RAM have a memory problem that could be detected by Memtest86+, the memory problems I’ve seen on ECC systems have been things that prevent booting (RAM not being recognised correctly), that are detected by the BIOS as ECC errors before booting, or that are reported by the kernel as ECC errors at run time (happened years ago and I can’t remember the details).

Overall I’m not a fan of EFI with the way it currently works in Debian. It seems to add some of the GRUB functionality into the BIOS and then use that to load GRUB. It seems that EFI can do everything you need and it would be better to just have a single boot loader not two of them chained.

Power Supply

There are a range of PSUs for the ML110, the one I have has the smallest available PSU (350W) and doesn’t have a PCIe power cable (the one used for video cards). Here is the HP document which shows the cabling for the various ML110 Gen8 PSUs [5], I have the 350W PSU. One thing I’ve considered is whether I could make an adaptor from the drive bay power to the PCIe connector. A quick web search indicates that 4 SAS disks when active can take up to 75W more power than a system with no disks. If that’s the case then the 2 spare drive bay connectors which can each handle 4 disks should be able to supply 150W. As a 6 pin PCIe power cable (GPU power cable) is rated at 75W that should be fine in theory (here’s a page with the pinouts for PCIe power connectors [6]). My video card is a Radeon R7 260X which apparently takes about 113W all up so should be taking less than 75W from the PCIe power cable.

All I really want is YouTube, Netflix, and text editing at 4K resolution. So I don’t need much in terms of 3D power. KDE uses some of the advanced features of modern video cards, but it doesn’t compare to 3D gaming. According to the Wikipedia page for Radeon RX 500 series [7] the RX560 supports DisplayPort 1.4 and HDMI 2.0 (both of which do 4K@60Hz) and has a TDP of 75W. So a RX560 video card seems like a good option that will work in any system that doesn’t have a spare PCIe power cable. I’ve just ordered one of those for $246 so hopefully that will arrive in a week or so.

PCI Fan

The ML110 Gen9 has an “optional” PCIe “fan and baffle” to cool PCIe cards (part number 784580-B21). Extra cooling of PCIe cards is a good thing, but $400 list price (and about $50 ebay price) for the fan and baffle is unpleasant. When I boot the system with a PCIe dual-ethernet card and two PCIe NVMe cards it gives a BIOS warning on boot, when I add a video card it refuses to boot without the extra fan. It’s nice that the system makes sure it doesn’t get into a thermal overload situation, but it would be nicer if they just shipped all necessary fans with it instead of trying to get more money out of customers. I just bought a PCI fan and baffle kit for $60.

Conclusion

In spite of the unexpected expense of a new video card and PCI fan the overall cost of this system is still low, particularly when considering that I’ll find another use for the video card which needs and extra power connector.

It is disappointing that HP didn’t supply a more capable PSU and fit all the fans to all models, the expectation of a server is that you can just do server stuff not have to buy extra bits before you can do server stuff. If you want to install Tesla GPUs or something then it’s expected that you might need to do something unusual with a server, but the basic stuff should just work. A single processor tower server should be designed to function as a deskside workstation and be able to handle an average video card.

Generally it’s a nice computer, I look forward to getting the next deliveries of parts so I can make it work properly.

,

Craige McWhirterThe Consensus on Branch Names

Consensus: Decisions are reached in a dialogue between equals

There was some kerfuffle in 2020 over the use of the term master in git, the origins of the term were resolutely settled so I set about renaming my primary branches to other words.

The one that most people seemed to be using was main, so I started using it too. While main was conveniently brief, it still felt inadequate. Something was wrong and it kept bubbling away in the background.

The word that kept percolating through was consensus.

I kept dismissing it for all the obvious reasons, such as it was too long, too unwieldy, too obscure or just simply not used commonly enough to be familiar or well understood.

The word was persistent though and consensus kept coming back.

One morning recently, I was staring at a git tree when the realisation slapped me in the face that in a git workflow the primary / master / main branches reflected a consensus point in that workflow.

Consensus: Decisions are reached in a dialogue between equals

That realisation settled it pretty hard for me, consensus not only accurately reflected the point in the workflow but was also the most correct English word for what that branch represented.

Continue the conversation on Matrix.

,

Chris NeugebauerAdding a PurpleAir monitor to Home Assistant

Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.

Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.

I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.

You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.

How to use this

First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.

Adding the configuration

In HomeAssistant, add the following line to your configuration.yaml:

sensor: !include purpleair.yaml

and then add the following contents to purpleair.yaml


 - platform: rest
   name: 'PurpleAir'

   # Substitute in the URL of the sensor you care about.  To find the URL, go
   # to purpleair.com/map, find your sensor, click on it, click on "Get This
   # Widget" then click on "JSON".
   resource: https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}

   # Only query once a minute to avoid rate limits:
   scan_interval: 60

   # Set this sensor to be the AQI value.
   #
   # Code translated from JavaScript found at:
   # https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#
   value_template: >
     {{ value_json["results"][0]["Label"] }}
   unit_of_measurement: ""
   # The value of the sensor can't be longer than 255 characters, but the
   # attributes can.  Store away all the data for use by the templates below.
   json_attributes:
     - results

 - platform: template
   sensors:
     purpleair_aqi:
       unique_id: 'purpleair_SENSORID_aqi_pm25'
       friendly_name: 'PurpleAir PM2.5 AQI'
       value_template: >
         {% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}
           {{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}
         {%- endmacro %}
         {% if (states('sensor.purpleair_pm25')|float) > 1000 %}
           invalid
         {% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}
         {% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}
         {% else %}
           invalid
         {% endif %}
       unit_of_measurement: "bit"
     purpleair_description:
       unique_id: 'purpleair_SENSORID_description'
       friendly_name: 'PurpleAir AQI Description'
       value_template: >
         {% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}
           Very Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}
           Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}
           Unhealthy for Sensitive Groups
         {% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}
           Moderate
         {% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}
           Good
         {% else %}
           undefined
         {% endif %}
       entity_id: sensor.purpleair
     purpleair_pm25:
       unique_id: 'purpleair_SENSORID_pm25'
       friendly_name: 'PurpleAir PM 2.5'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['PM2_5Value'] }}"
       unit_of_measurement: "μg/m3"
       entity_id: sensor.purpleair
     purpleair_temp:
       unique_id: 'purpleair_SENSORID_temperature'
       friendly_name: 'PurpleAir Temperature'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['temp_f'] }}"
       unit_of_measurement: "°F"
       entity_id: sensor.purpleair
     purpleair_humidity:
       unique_id: 'purpleair_SENSORID_humidity'
       friendly_name: 'PurpleAir Humidity'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['humidity'] }}"
       unit_of_measurement: "%"
       entity_id: sensor.purpleair
     purpleair_pressure:
       unique_id: 'purpleair_SENSORID_pressure'
       friendly_name: 'PurpleAir Pressure'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['pressure'] }}"
       unit_of_measurement: "hPa"
       entity_id: sensor.purpleair

Quirks

I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�

,

Chris SmartAuto-update Pi-hole with systemd timer

I have two Pi-hole servers at home running Fedora with DNS over TLS, both of which auto update on different days (to avoid having both down if something goes wrong).

First, create a script to get the latest updates if any are available, rebooting the Pi-hole if they were successful.

cat << \EOF | sudo tee /usr/local/sbin/update-pihole.sh
#!/bin/bash
PIHOLE_UPDATE="$(pihole -up --check-only)"
if ! grep -q 'Everything is up to date' <<< "${PIHOLE_UPDATE}" ; then
  pihole -up
  if [[ $? -eq 0 ]] ; then
    echo "$(date "+%h %d %T") update: success" >> /var/log/pihole.log
    reboot
  fi
else
    echo "$(date "+%h %d %T") update: nothing to do" >> /var/log/pihole.log
fi
EOF

Make the script executable.

sudo chmod a+x /usr/local/sbin/update-pihole.sh

Next, let’s create a systemd service for the update, which is required to be able to create a timer.

cat << EOF | sudo tee /etc/systemd/system/update-pihole.service 
[Unit]
Description=Update pihole
After=network-online.target
 
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/update-pihole.sh
EOF

Now we can create the systemd timer. Here I am updating weekly on Mondays at midnight (my other Pi-hole updates weekly on Thursdays), feel free to adjust as you see fit. If you have two, perhaps update on alternate days, like I do.

cat << EOF | sudo tee /etc/systemd/system/update-pihole.timer 
[Unit]
Description=Timer for updating pihole
Wants=network-online.target
 
[Timer]
OnBootSec=
OnCalendar=Mon *-*-* 00:00:00
Persistent=true

[Install]
WantedBy=timers.target
EOF

Now that we have the timer, we can tell systemd about it and enable the timer.

sudo systemctl daemon-reload
sudo systemctl enable --now update-pihole.timer

That it’s it! Your systems will now check for updates and be rebooted if necessary. It might be good to pair this with some monitoring to ensure that your Pi-holes are working as expected and be alerted if otherwise…

,

Stewart SmithAn Unearthly Child

So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.

A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.

I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.

Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.

“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).

,

Simon LyallAudiobook Reviews – April 2021

Inheriting Clutter: How to Calm the Chaos Your Parents Leave Behind by Julie Hall

Lots of specific advice for Children and Parents on preparing for and handling estates. Lots of good advice on defusing feuds before they start. 3/5

Shortest Way Home: One Mayor’s Challenge and a Model for America’s Future by Pete Buttigieg

Memoir of a small-city mayor who grew up gay in Indiana. Timed to come out for his presidential run in 2019. Nice enough read with a good mix of stories. 3/5

Moonraker by Ian Fleming

James Bond investigates the mysterious industrialist Hugo Drax and his nuclear missile project which is vital to Britain’s security. Exciting and well written. 3/5

Some Assembly Required: Decoding Four Billion Years of Life, from Ancient Fossils to DNA by Neil Shubin

A very accessible account of how various ways genetic information is passed down was discovered, who found it and how it works. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Jan SchmidtRift CV1 – Getting close now…

It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.

Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.

Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:

Beat Saber – Skunkynator playing Expert+, Mar 16 2021

Strap in. Here’s what I’ve worked on in the last 6 weeks:

Pose Matching improvements

Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.

I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.

Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.

In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.

I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.

LED labelling

When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.

The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.

IMU Gyro scale

One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.

The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.

From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.

The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.

Limited interpolation

If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.

Exponential filtering

I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.

Non-blocking radio

Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.

Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.

As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.

Haptic feedback

The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.

In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.

With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.

The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip

Problem areas

Unexpected tracking losses

I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:

To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.

Sensor fusion efficiency

The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.

At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.

Lens Distortion

To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.

Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.

Camera / USB failures

The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.

Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.

Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.

Occluded cases

Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.

If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.

Upstreaming

April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.

Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.

One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.

Others

I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.

Other Headsets

At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.

Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.

Sponsorship

This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.

,

Stewart Smithlibeatmydata v129

Every so often, I release a new libeatmydata. This has not happened for a long time. This is just some bug fixes, most of which have been in the Debian package for some time, I’ve just been lazy and not sat down and merged them.

git clone https://github.com/stewartsmith/libeatmydata.git

Download the source tarball from here: libeatmydata-129.tar.gz and GPG signature: libeatmydata-129.tar.gz.asc from my GPG key.

Or, feel free to grab some Fedora RPMs:

Releases published also in the usual places:

,

Simon LyallAudiobook Reviews – March 2021

The Dream Machine: The Untold History of the Notorious V-22 Osprey by Richard Whittle

The story of tilt-rotor aircraft & the long history of the V-22’s development. Covers defense politics and technical matters equally well. 4/5

Broad Band: The Untold Story of the Women Who Made the Internet by Claire L. Evans

A series of stories about individuals, not just about the Internet but about women and early computing, hypertext, etc. Interesting and well written. 3/5

The Fifth Risk by Michael Lewis

Lewis interviews people involved in the Obama to Trump transition at 3 major government agencies. He profiles the people, their jobs and in most cases how the Trump people underestimated the Dept’s importance. 3/5

OK Boomer, Let’s Talk: How My Generation Got Left Behind by Jill Filipovic

Mostly a stats dump with a few profiles and accounts of struggling millennials sprinkled in. With a weird tone shift to boomer-love in the last chapter. Okay I guess 3/5

Six Days of Impossible: Navy SEAL Hell Week – A Doctor Looks Back by Robert Adams

A first-hand account of a training class in 1974/75 where only 11 of the 71 starters graduated. Fun read although some interviews with non-graduates would have provided a contrast. 3/5

Three Laws of Nature: A Little Book on Thermodynamics by R Stephen Berry

Science mixed in with some history, designed for those with minimal science. The equations were simple but numerous & didn’t work in audiobook format. Try the printed version. 2/5

Space Odyssey: Stanley Kubrick Arthur C Clarke and the Making of a Masterpiece by Michael Benson

A detailed account of the film’s making from pre-production though to the bad reviews of the first release. Covers most aspects of the film and people involved. 4/5

The Soul of a New Machine by Tracy Kidder

Pulitzer Prize winning story of a team creating a new model of minicomputer in the late-1970s. Good portraits of the team members and aspects of the tech. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

BlueHackersWorld bipolar day 2021

Today, 30 March, is World Bipolar Day.

Vincent van Gogh - Worn Out

Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).

The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.

Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.

Have you heard of Bipolar II?

Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.

Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)

Call to action

If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.

Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.

I will be writing more about the topic in the coming time.

The post World bipolar day 2021 first appeared on BlueHackers.org.

,

Stewart SmithThe Apple Power Macintosh 7200/120 PC Compatible (Part 1)

So, I learned something recently: if you pick up your iPhone with eBay open on an auction bid screen in just the right way, you may accidentally click the bid button and end up buying an old computer. Totally not the worst thing ever, and certainly a creative way to make a decision.

So, not too long later, a box arrives!

In the 1990s, Apple created some pretty “interesting” computers and product line. One thing you could get is a DOS Compatibility (or PC Compatibility) card. This was a card that went into one of the expansion slots on a Mac and had something really curious on it: most of the guts of a PC.

Others have written on these cards too: https://www.engadget.com/2009-12-10-before-there-was-boot-camp-there-were-dos-compatibility-cards.html and http://www.edibleapple.com/2009/12/09/blast-from-the-past-a-look-back-at-apples-dos-compatibility-cards/. There’s also the Service Manual https://tim.id.au/laptops/apple/misc/pc_compatibility_card.pdf with some interesting details.

The machine I’d bought was an Apple Power Macintosh 7200/120 with the PC Compatible card added afterwards (so it doesn’t have the PC Compatible label on the front like some models ended up getting).

The Apple Power Macintosh 7200/120

Wikipedia has a good article on the line, noting that it was first released in August 1995, and fitting for the era, was sold as about 14 million other model numbers (okay not quite that bad, it was only a total of four model numbers for essentially the same machine). This specific model, the 7200/120 was introduced on April 22nd, 1996, and the original web page describing it from Apple is on the wayback machine.

For older Macs, Low End Mac is a good resource, and there’s a page on the 7200, and amazingly Apple still has the tech specs on their web site!

The 7200 series replaced the 7100, which was one of the original PowerPC based Macs. The big changes are using the industry standard PCI bus for its three expansion slots rather than NuBus. Rather surprisingly, NuBus was not Apple specific, but you could not call it widely adopted by successful manufacturers. Apple first used NuBus in the 1987 Macintosh II.

The PCI bus was standardized in 1992, and it’s almost certain that a successor to it is in the computer you’re using to read this. It really quite caught on as an industry standard.

The processor of the machine is a PowerPC 601. The PowerPC was an effort of IBM, Apple, and Motorola (the AIM Alliance) to create a class of processors for personal computers based on IBM’s POWER Architecture. The PowerPC 601 was the first of these processors, initially used by Apple in its Power Macintosh range. The machine I have has one running at a whopping 120Mhz. There continued to be PowerPC chips for a number of years, and IBM continued making POWER processors even after that. However, you are almost certainly not using a PowerPC derived processor in the computer you’re using to read this.

The PC Compatibility card has on it a full on legit Pentium 100 processor, and hardware for doing VGA graphics, a Sound Blaster 16 and the other things you’d usually expect of a PC from 1996. Since it’s on a PCI card though, it’s a bit different than a PC of the era. It doesn’t have any expansion slots of its own, and in fact uses up one of the three PCI slots in the Mac. It also doesn’t have its own floppy drive, or hard drive. There’s software on the Mac that will let the PC card use the Mac’s floppy drive, and part of the Mac’s hard drive for the PC!

The Pentium 100 was the first mass produced superscalar processor. You are quite likely to be using a computer with a processor related to the Pentium to read this, unless you’re using a phone or tablet, or one of the very latest Macs; in which case you’re using an ARM based processor. You likely have more ARM processors in your life than you have socks.

Basically, this computer is a bit of a hodge-podge of historical technology, some of which ended up being successful, and other things less so.

Let’s have a look inside!

So, one of the PCI slots has a Vertex Twin Turbo 128M8A video card in it. There is not much about this card on the internet. There’s a photo of one on Wikimedia Commons though. I’ll have to investigate more.

Does it work though? Yes! Here it is on my desk:

The powered on Power Mac 7200/120

Even with Microsoft Internet Explorer 4.0 that came with MacOS 8.6, you can find some places on the internet you can fetch files from, at a not too bad speed even!

More fun times with this machine to come!

,

sthbrx - a POWER technical blogFuzzing grub: part 1

Recently a set of 8 vulnerabilities were disclosed for the grub bootloader. I found 2 of them (CVE-2021-20225 and CVE-2021-20233), and contributed a number of other fixes for crashing bugs which we don't believe are exploitable. I found them by applying fuzz testing to grub. Here's how.

This is a multi-part series: I think it will end up being 4 posts. I'm hoping to cover:

  • Part 1 (this post): getting started with fuzzing grub
  • Part 2: going faster by doing lots more work
  • Part 3: fuzzing filesystems and more
  • Part 4: potential next steps and avenues for further work

Fuzz testing

Let's begin with part one: getting started with fuzzing grub.

One of my all-time favourite techniques for testing programs, especially programs that handle untrusted input, and especially-especially programs written in C that parse untrusted input, is fuzz testing. Fuzz testing (or fuzzing) is the process of repeatedly throwing randomised data at your program under test and seeing what it does.

(For the secure boot threat model, untrusted input is anything not validated by a cryptographic signature - so config files are untrusted for our purposes, but grub modules can only be loaded if they are signed, so they are trusted.)

Fuzzing has a long history and has recently received a new lease on life with coverage-guided fuzzing tools like AFL and more recently AFL++.

Building grub for AFL++

AFL++ is extremely easy to use ... if your program:

  1. is built as a single binary with a regular tool-chain
  2. runs as a regular user-space program on Linux
  3. reads a small input files from disk and then exits
  4. doesn't do anything fancy with threads or signals

Beyond that, it gets a bit more complex.

On the face of it, grub fails 3 of these 4 criteria:

  • grub is a highly modular program: it loads almost all of its functionality as modules which are linked as separate ELF relocatable files. (Not runnable programs, but not shared libraries either.)

  • grub usually runs as a bootloader, not as a regular app.

  • grub reads all sorts of things, ranging in size from small files to full disks. After loading most things, it returns to a command prompt rather than exiting.

Fortunately, these problems are not insurmountable.

We'll start with the 'running as a bootloader' problem. Here, grub helps us out a bit, because it provides an 'emulator' target, which runs most of grub functionality as a userspace program. It doesn't support actually booting anything (unsurprisingly) but it does support most other modules, including things like the config file parser.

We can configure grub to build the emulator. We disable the graphical frontend for now.

./bootstrap
./configure --with-platform=emu --disable-grub-emu-sdl

At this point in building a fuzzing target, we'd normally try to configure with afl-cc to get the instrumentation that makes AFL(++) so powerful. However, the grub configure script is not a fan:

./configure --with-platform=emu --disable-grub-emu-sdl CC=$AFL_PATH/afl-cc
...
checking whether target compiler is working... no
configure: error: cannot compile for the target

It also doesn't work with afl-gcc.

Hmm, ok, so what if we just... lie a bit?

./configure --with-platform=emu --disable-grub-emu-sdl
make CC="$AFL_PATH/afl-gcc" 

(Normally I'd use CC=clang and afl-cc, but clang support is slightly broken upstream at the moment.)

After a small fix for gcc-10 compatibility, we get the userspace tools (potentially handy!) but a bunch of link errors for grub-emu:

/usr/bin/ld: disk.module:(.bss+0x20): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: regexp.module:(.bss+0x70): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here
/usr/bin/ld: blocklist.module:(.bss+0x28): multiple definition of `__afl_global_area_ptr'; kernel.exec:(.bss+0xe078): first defined here

The problem is the module linkage that I talked about earlier: because there is a link stage of sorts for each module, some AFL support code gets linked in to both the grub kernel (kernel.exec) and each module (here disk.module, regexp.module, ...). The linker doesn't like it being in both, which is fair enough.

To get started, let's instead take advantage of the smarts of AFL++ using Qemu mode instead. This builds a specially instrumented qemu user-mode emulator that's capable of doing coverage-guided fuzzing on uninstrumented binaries at the cost of a significant performance penalty.

make clean
make

Now we have a grub-emu binary. If you run it directly, you'll pick up your system boot configuration, but the -d option can point it to a directory of your choosing. Let's set up one for fuzzing:

mkdir stage
echo "echo Hello sthbrx readers" > stage/grub.cfg
cd stage
../grub-core/grub-emu -d .

You probably won't see the message because the screen gets blanked at the end of running the config file, but if you pipe it through less or something you'll see it.

Running the fuzzer

So, that seems to work - let's create a test input and try fuzzing:

cd ..
mkdir in
echo "echo hi" > in/echo-hi

cd stage
# -Q qemu mode
# -M main fuzzer
# -d don't do deterministic steps (too slow for a text format)
# -f create file grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -- ../grub-core/grub-emu -d .

Sadly:

[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.

    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.

[-] PROGRAM ABORT : Test case 'id:000000,time:0,orig:echo-hi' results in a timeout
         Location : perform_dry_run(), src/afl-fuzz-init.c:866

What we're seeing here (and indeed what you can observe if you run grub-emu directly) is that grub-emu isn't exiting when it's done. It's waiting for more input, and will keep waiting for input until it's killed by afl-fuzz.

We need to patch grub to sort that out. It's on my GitHub.

Apply that, rebuild with FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION, and voila:

cd ..
make CFLAGS="-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION"
cd stage
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -d -f grub.cfg -- ../grub-core/grub-emu -d .

And fuzzing is happening!

afl-fuzz fuzzing grub, showing fuzzing happening

This is enough to find some of the (now-fixed) bugs in the grub config file parsing!

Fuzzing beyond the config file

You can also extend this to fuzzing other things that don't require the graphical UI, such as grub's transparent decompression support:

cd ..
rm -rf in out stage
mkdir in stage
echo hi > in/hi
gzip in/hi
cd stage
echo "cat thefile" > grub.cfg
$AFL_PATH/afl-fuzz -Q -i ../in -o ../out -M main -f thefile -- ../grub-core/grub-emu -d .

You should be able to find a hang pretty quickly with this, an as-yet-unfixed bug where grub will print output forever from a corrupt file: (your mileage may vary, as will the paths.)

cp ../out/main/hangs/id:000000,src:000000,time:43383,op:havoc,rep:16 thefile
../grub-core/grub-emu -d . | less # observe this going on forever

zcat, on the other hand, reports it as simply corrupt:

$ zcat thefile

gzip: thefile: invalid compressed data--format violated

(Feel free to fix that and send a patch to the list!)

That wraps up part 1. Eventually I'll be back with part 2, where I explain the hoops to jump through to go faster with the afl-cc instrumentation.

,

Simon LyallMoving my backups to restic

I’ve recently moved my home backups over to restic . I’m using restic to backup the /etc and /home folders and on all machines are my website files and databases. Media files are backed up separately.

I have around 220 Gigabytes of data, about half of that is photos.

My Home setup

I currently have 4 regularly-used physical machines at home: two workstations, one laptop and server. I also have a VPS hosted at Linode and a VM running on the home server. Everything is running Linux.

Existing Backup Setup

For at least 15 years I’ve been using rsnaphot for backup. rsnapshot works by keeping a local copy of the folders to be backup up. To update the local copy it uses rsync over ssh to pull down a copy from the remote machine. It then keeps multiple old versions of files by making a series of copies.

I’d end up with around 12 older versions of the filesystem (something like 5 daily, 4 weekly and 3 monthly) so I could recover files that had been deleted. To save space rsnapshot uses hard links so only one copy of a file is kept if the contents didn’t change.

I also backed up a copy to external hard drives regularly and kept one copy offsite.

The main problem with rsnapshot was it was a little clunky. It took a long time to run because it copied and deleted a lot of files every time it ran. It also is difficult to exclude folders from being backed up and it is also not compatible with any cloud based filesystems. It also requires ssh keys to login to remote machines as root.

Getting started with restic

I started playing around with restic after seeing some recommendations online. As a single binary with a few commands it seemed a little simpler than other solutions. It has a push model so needs to be on each machine and it will upload from there to the archive.

Restic supports around a dozen storage backends for repositories. These include local file system, sftp and Amazon S3. When you create an archive via “restic init” it creates a simple file structure for the repository in most backends:

You can then use simple commands like “restic backup /etc” to backup files to there. The restic documentation site makes things pretty easy to follow.

Restic automatically encrypts backups and each server needs a key to read/write to it’s backups. However any key can see all files in a repository even those belonging to other hosts.

Backup Strategy with Restic

I decided on the followup strategy for my backups:

  • Make a daily copy of /etc and other files for each server
  • Keep 5 daily and 3 weekly copies
  • Have one copy of data on Backblaze B2
  • Have another copy on my home server
  • Export the copies on the home server to external disk regularly

Backblaze B2 is very similar Amazon S3 and is supported directly by restic. It is however cheaper. Storage is 0.5 cents per gigabyte/month and downloads are 1 cent per gigabyte. In comparison AWS S3 One Zone Infrequent access charges 1 cent per gigabyte/month for storage and 9 cents per gigabyte for downloads.

WhatBackblaze B2 AWS S3
Store 250 GB per month$1.25$2.50
Download 250 GB$2.50$22.50

AWS S3 Glacier is cheaper for storage but hard to work with and retrieval costs would be even higher.

Backblaze B2 is less reliable than S3 (they had an outage when I was testing) but this isn’t a big problem when I’m using them just for backups.

Setting up Backblaze B2

To setup B2 I went to the website and created an account. I would advise putting in your credit card once you finish initial testing as it will not let you add more than 10GB of data without one.

I then created a private bucket and changed the bucket’s lifecycle settings to only keep the last version.

I decided that for security I would have each server use a separate restic repository. This means that I would use a bit of extra space since restic will only keep one copy of a file that is identical on most machines. I ended up using around 15% more.

For each machine I created an B2 application key and set it to have a namePrefix with the name of the machine. This means that each application key can only see files in it’s own folder

On each machine I installed restic and then created an /etc/restic folder. I then added the file b2_env:

export B2_ACCOUNT_ID=000xxxx
export B2_ACCOUNT_KEY=K000yyyy
export RESTIC_PASSWORD=abcdefghi
export RESTIC_REPOSITORY=b2:restic-bucket:/hostname

You can now just run “restic init” and it should create an empty repository, check via b2 to see.

I then had a simple script that runs:

source /etc/restic/b2_env

restic --limit-upload 2000 backup /home/simon --exclude-file /etc/restic/home_exclude

restic --limit-upload 2000 backup /etc /usr/local /var/lib /var/backups

restic --verbose --keep-last 5 --keep-daily 6 --keep-weekly 3 forget

The “source” command loads in the api key and passwords.

The restic backup lines do the actual backup. I have restricted my upload speed to 20 Megabits/second . The /etc/restic/home_exclude lists folders that shouldn’t be backed up. For this I have:

/home/simon/.cache
/home/simon/.config/Slack
/home/simon/.local/share/Trash
/home/simon/.dropbox-dist
/home/simon/Syncthing/audiobooks

as these are folders with regularly changing contents that I don’t need to backup.

The “restic forget” command removes older snapshots. I’m telling it to keep 6 daily copies and 3 weekly copies of my data, plus at least the most recent 5 no matter how old then are.

This command doesn’t actually free up the space taken up by the removed snapshots. I need to run the “restic prune” command for that. However according to this analysis the prune operation generates so many API calls and data transfers that the payback time on disk space saved can be months(!). So for now I’m planning to run the command only occasionally (probably every few months, depending on testing).

Setting up sftp

As well as backing up to B2 I wanted to backup my data to my home server. In this case I decided to have a single repository shared by all the servers.

First of all I created a “restic” account on my server with a home of /home/restic. I then created a folder /media/backups/restic owned by the restic user.

I then followed this guide for sftp-only accounts to restrict the restic user. Relevant lines I changed were “Match User restic” and “ChrootDirectory /media/backups/restic “

On each host I also needed to run “cp /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa ” and also add the host’s public ssh_key to /home/restic/.ssh/authorized_keys on the server.

Then it is just a case of creating a sftp_env file like in the b2 example above. Except this is a little shorter:

export RESTIC_REPOSITORY=sftp:restic@server.darkmere.gen.nz:shared
export RESTIC_PASSWORD=abcdefgh

For backing up my VPS I had to do another step since this couldn’t push files to my home. What I did was instead add a script that ran on the home server and used rsync to copy down folders from by VPS to local. I used rrsync to restrict this script.

Once I had a local folder I ran “restic –home vps-name backup /copy-of-folder” to backup over sftpd. The –host option made sure the backups were listed for the right machine.

Since the restic folder is just a bunch of files, I’m copying up it directly to external disk which I keep outside the house.

Parting Thoughts

I’m fairly happy with restic so far. I don’t have not run into too many problems or gotchas yet although if you are starting up I’d suggest testing with a small repository to get used to the commands etc.

I have copies of keys in my password manager for recovery.

There are a few things I still have to do including setup up some monitoring and also decide how often to run the prune operation.

Share

,

Simon LyallAudiobooks – February 2021

Lost and Founder: A Painfully Honest Field Guide to the Startup World by Rand Fishkin

Advice for perspective founders mixed in with stories from the author’s company. Open about missteps he made to be avoided. 4/5

The Victorian Internet: The Remarkable Story of the Telegraph and the Nineteenth Century’s On-line Pioneers by Tom Standage

A short book on the rise of the telegraph and how it changed the world. Peppered with amusing stories and analogies to the Internet. 3/5

Dreams from My Father: A Story of Race and Inheritance by Barack Obama

A memoir of the author growing up and into his mid-20s. Well written and interesting. Audiobook is read by the author but he’s okay. 3/5

The Age of Benjamin Franklin by Robert J. Allison

24 Lectures about various aspects of Franklin and his life. Each lecture is on a theme so they are not chronological. I hadn’t read any biographies previously but this might help. 4/5

The Relentless Moon by Mary Robinette Kowal

3rd book in the Lady Astronaut series. Mostly concerned with trying to find and stop agents sabotaging the Moonbase. Works well and held my interest. 3/5

Business Adventures: Twelve Classic Tales from the World of Wall Street by John Brooks

A collection of long New Yorker articles from the 1960s. One on a stock corner even has parallels with Gamestop in 2021. Interesting and well told even when dated. 3/5

Live and Let Die by Ian Fleming

James Bond takes on Gangster/Agent/Voodoo leader ‘Mr Big’ in Harlem, Florida and Jamaica. The racial stereotypes are dated but could be worse. The story held my interest. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Rusty RussellA Model for Bitcoin Soft Fork Activation

TL;DR: There should be an option, taproot=lockintrue, which allows users to set lockin-on-timeout to true. It should not be the default, though.

As stated in my previous post, we need actual consensus, not simply the appearance of consensus. I’m pretty sure we have that for taproot, but I would like a template we can use in future without endless debate each time.

  • Giving every group a chance to openly signal for (or against!) gives us the most robust assurance that we actually have consensus. Being able to signal opposition is vital, since everyone can lie anyway; making opposition difficult just reduces the reliability of the signal.
  • Developers should not activate. They’ve tried to assure themselves that there’s broad approval of the change, but that’s not really a transferable proof. We should be concerned about about future corruption, insanity, or groupthink. Moreover, even the perception that developers can set the rules will lead to attempts to influence them as Bitcoin becomes more important. As a (non-Bitcoin-core) developer I can’t think of a worse hell myself, nor do we want to attract developers who want to be influenced!
  • Miner activation is actually brilliant. It’s easy for everyone to count, and majority miner enforcement is sufficient to rely on the new rules. But its real genius is that miners are most directly vulnerable to the economic majority of users: in a fork they have to pick sides continuously knowing that if they are wrong, they will immediately suffer economically through missed opportunity cost.
  • Of course, economic users are ultimately in control. Any system which doesn’t explicitly encode that is fragile; nobody would argue that fair elections are unnecessary because if people were really dissatisfied they could always overthrow the government themselves! We should make it as easy for them to exercise this power as possible: this means not requiring them to run unvetted or home-brew modifications which will place them at more risk, so developers need to supply this option (setting it should also change the default User-Agent string, for signalling purposes). It shouldn’t be an upgrade either (which inevitably comes with other changes). Such a default-off option provides both a simple method, and a Schelling point for the lockinontimeout parameters. It also means much less chance of this power being required: “Si vis pacem, para bellum“.

This triumverate model may seem familiar, being widely used in various different governance systems. It seems the most robust to me, and is very close to what we have evolved into already. Formalizing it reduces uncertainty for any future changes, as well.

,

Rusty RussellBitcoin Consensus and Solidarity

Bitcoin’s consensus rules define what is valid, but this isn’t helpful when we’re looking at changing the rules themselves. The trend in Bitcoin has been to make such changes in an increasingly inclusive and conservative manner, but we are still feeling our way through this, and appreciating more nuance each time we do so.

To use Bitcoin, you need to remain in the supermajority of consensus on what the rules are. But you can never truly know if you are. Everyone can signal, but everyone can lie. You can’t know what software other nodes or miners are running: even expensive testing of miners by creating an invalid block only tests one possible difference, may still give a false negative, and doesn’t mean they can’t change a moment later.

This risk of being left out is heightened greatly when the rules change. This is why we need to rely on multiple mechanisms to reassure ourselves that consensus will be maintained:

  1. Developers assure themselves that the change is technically valid, positive and has broad support. The main tools for this are open communication, and time. Developers signal support by implementing the change.
  2. Users signal their support by upgrading their nodes.
  3. Miners signal their support by actually tagging their blocks.

We need actual consensus, not simply the appearance of consensus. Thus it is vital that all groups know they can express their approval or rejection, in a way they know will be heard by others. In the end, the economic supermajority of Bitcoin users can set the rules, but no other group or subgroup should have inordinate influence, nor should they appear to have such control.

The Goodwill Dividend

A Bitcoin community which has consensus and knows it is not only safest from a technical perspective: the goodwill and confidence gives us all assurance that we can make (or resist!) changes in future.

It will also help us defend against the inevitable attacks and challenges we are going to face, which may be a more important effect than any particular soft-fork feature.

,

Simon LyallAudiobooks – January 2021

The Esperanza Fire: Arson, Murder and the Agony of Engine 57 by John N. Maclean

An account of the fire that killed a five-person firefighter crew. Minute by minute of the fire itself, plus the investigation and the trial of the arsonist. 4/5

Range: Why Generalists Triumph in a Specialized World by David Epstein

An argument against early-specialisation and over-specialisation. How it fails against open non-predictable problems and environments. 4/5

The Vikings: A New History by Neil Oliver

A vaguely chronological introduction to the Vikings. Lots of first person descriptions of artifacts by the author. 3/5

Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career by Scott Young

Examples and advice on how to learn a skill very quickly, usually via an intense method. Good practical advice mixed with some stories 3/5

81 Days Below Zero: The Incredible Survival Story of a World War II Pilot in Alaska’s Frozen Wilderness by Brian Murphy

An interesting survival story. The pilot survives a crash in a remote area & manages to walk out with minimal gear during winter. 3/5

Messy: How to Be Creative and Resilient in a Tidy-Minded World by Tim Harford

The unexpected connections between creativity and mess. Lots of examples although as one commentator noticed most of them were from people already masters not beginners. 3/5

Outliers: The Story of Success by Martin Gladwell

A book on how the most famous and successful are often there because of their upbringing, practice or chance events pushed them to the top rather than just raw talent. 4/5

The Book of Humans: The Story of How We Became Us by Adam Rutherford

How the latest research that reveals the extent to which behaviors once thought exclusively human are also found in other species. Spoiler: except Culture. 3/5

Tank Action: An Armoured Troop Commander’s War 1944-45 by David Render and Stuart Tootal

The author is thrown into the war as a 19 year old officer in command of 4 tanks 5 days after D-Day. Very well written and lots of detail of the good and the bad. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Jan SchmidtRift CV1 – Testing SteamVR

I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:

  • Make sure Steam + SteamVR are already installed.
  • Clone the SteamVR-OpenHMD repository:
git clone --recursive https://github.com/ChristophHaag/SteamVR-OpenHMD.git
  • Switch the internal copy of OpenHMD to the right branch:
cd subprojects/openhmd
git remote add thaytan-github https://github.com/thaytan/OpenHMD.git
git fetch thaytan-github
git checkout -b rift-kalman-filter thaytan-github/rift-kalman-filter
cd ../../
  • Use meson to build and register the SteamVR-OpenHMD binaries. You may need to install meson first (see below):
meson -Dbuildtype=release build
ninja -C build
./install_files_to_build.sh
./register.sh
  • It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
  • Make sure your USB devices are accessible to your user account by configuring udev. See the OpenHMD guide here: https://github.com/OpenHMD/OpenHMD/wiki/Udev-rules-list
  • Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
  • It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
./build/subprojects/openhmd/openhmd_simple_example
  • Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental 🙂
  • Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.

Meson

I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.

If you need to install meson, there are instructions here – https://mesonbuild.com/Getting-meson.html summarising the various methods.

I use a copy in my home directory, but you need to make sure ~/.local/bin is in your PATH

pip3 install --user meson

,

Jan SchmidtRift CV1 – Pose rejection

I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.

In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):

  1. Do a brute-force search to match LEDs to 3D models, then (if matched)
    1. Assign labels to each LED blob in the video frame saying what LED they are.
    2. Send an update to the fusion filter about the position / orientation of the device
  2. Then, as each video frame arrives:
    1. Use motion flow between video frames to track the movement of each visible LED
    2. Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
  3. Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.

The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible

What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.

Gravity Vector matching

The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.

The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.

Partial tracking matches

The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.

With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.

While this definitely seems to help, I think the approach can use more work.

Status

With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.

I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.

For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.

Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.

After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.

,

Jan SchmidtHitting a milestone – Beat Saber!

I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!

I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy 🙂

Check it out:

I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.

I have a list of more things to work on. See you at the next update!

,

Sam WatkinsDeveloping CZ, a dialect of C that looks like Python

In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.

Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.

My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.

The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.

C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).

At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.

CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.

I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.

CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.

One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.

CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.

I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.

For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).

The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.

CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.

I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!

CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.

Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.

#!/usr/local/bin/cz --
use b
use ccomplex

Main:
	num outside = 16, ox = -0.5, oy = 0, r = 1.5
	long i, max_i = 50, rb_i = 30
	space()
	uint32_t *px = pixel()  # CONFIGURE!
	num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
	for(y, 0, h):
		cmplx c = x0 + (y0-d*y)*I
		repeat(w):
			cmplx w = c
			for i=0; i < max_i && cabs(w) < outside; ++i
				w = w*w + c
			*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
			c += d

I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.

An image from the Mandelbrot set, generated by a fairly simple CZ program.

I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.

Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:

+add, subtract, disjoint union, zero
×multiply, divide, cartesian product, one
^power, root, logarithm
sin, cos, sin-1, cos-1, hypot, atan2
δdifferential, integral
a set of minimal relational operators for elementary math

I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.

CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.

Thanks for reading. I wrote this article during the process of applying to join Toptal, which appears to be a freelancing portal for top developers; and in response to this article on toptal: After All These Years, the World is Still Powered by C Programming.

My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!

Gary PendergastWordPress Importers: Free (as in Speech)

Back at the start of this series, I listed four problems within the scope of the WordPress Importers that we needed to address. Three of them are largely technical problems, which I covered in previous posts. In wrapping up this series, I want to focus exclusively on the fourth problem, which has a philosophical side as well as a technical one — but that does not mean we cannot tackle it!

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

Some services are merely inconvenient: they provide exports, but it often involves downloading a bunch of different files. Your CMS content is in one export, your store products are in another, your orders are in another, and your mailing list is in yet another. It’s not ideal, but they at least let you get a copy of your data.

However, there’s another class of services that actively work against their customers. It’s these services I want to focus on: the services that don’t provide any ability to export your content — effectively locking people in to using their platform. We could offer these folks an escape! The aim isn’t to necessarily make them use WordPress, it’s to give them a way out, if they want it. Whether they choose to use WordPress or not after that is immaterial (though I certainly hope they would, of course). The important part is freedom of choice.

It’s worth acknowledging that this is a different approach to how WordPress has historically operated in relation to other CMSes. We provide importers for many CMSes, but we previously haven’t written exporters. However, I don’t think this is a particularly large step: for CMSes that already provide exports, we’d continue to use those export files. This is focussed on the few services that try to lock their customers in.

Why Should WordPress Take This On?

There are several aspects to why we should focus on this.

First of all, it’s the the WordPress mission. Underpinning every part of WordPress is the simplest of statements:

Democratise Publishing

The freedom to build. The freedom to change. The freedom to share.

These freedoms are the pillars of a Free and Open Web, but they’re not invulnerable: at times, they need to be defended, and that needs people with the time and resources to offer a defence.

Which brings me to my second point: WordPress has the people who can offer that defence! The WordPress project has so many individuals working on it, from such a wide variety of backgrounds, we’re able to take on a vast array of projects that a smaller CMS just wouldn’t have the bandwidth for. That’s not to say that we can do everything, but when there’s a need to defend the entire ecosystem, we’re able to devote people to the cause.

Finally, it’s important to remember that WordPress doesn’t exist in a vacuum, we’re part of a broad ecosystem which can only exist through the web remaining open and free. By encouraging all CMSes to provide proper exports, and implementing them for those that don’t, we help keep our ecosystem healthy.

We have the ability to take on these challenges, but we have a responsibility that goes alongside. We can’t do it solely to benefit WordPress, we need to make that benefit available to the entire ecosystem. This is why it’s important to define a WordPress export schema, so that any CMS can make use of the export we produce, not just WordPress. If you’ll excuse the imagery for a moment, we can be the knight in shining armour that frees people — then gives them the choice of what they do with that freedom, without obligation.

How Can We Do It?

Moving on to the technical side of this problem, I can give you some good news: the answer is definitely not screen scraping. 😄 Scraping a site is fragile, impossible to transform into the full content, and provides an incomplete export of the site: anything that’s only available in the site dashboard can’t be obtained through scraping.

I’ve recently been experimenting with an alternative approach to solving this problem. Rather than trying to create something resembling a traditional exporter, it turns out that modern CMSes provide the tools we need, in the form of REST APIs. All we need to do is call the appropriate APIs, and collate the results. The fun part is that we can authenticate with these APIs as the site owner, by calling them from a browser extension! So, that’s what I’ve been experimenting with, and it’s showing a lot of promise.

If you’re interested in playing around with it, the experimental code is living in this repository. It’s a simple proof of concept, capable of exporting the text content of a blog on a Wix site, showing that we can make a smooth, comprehensive, easy-to-use exporter for any Wix site owner.

Screenshot of the "Free (as in Speech)" browser extension UI.

Clicking the export button starts a background script, which calls Wix’s REST APIs as the site owner, to get the original copy of the content. It then packages it up, and presents it as a WXR file to download.

Screenshot of a Firefox download dialog, showing a Wix site packaged up as a WXR file.

I’m really excited about how promising this experiment is. It can ultimately provide a full export of any Wix site, and we can add support for other CMS services that choose to artificially lock their customers in.

Where Can I Help?

If you’re a designer or developer who’s excited about working on something new, head on over to the repository and check out the open issues: if there’s something that isn’t already covered, feel free to open a new issue.

Since this is new ground for a WordPress project, both technically and philosophically, I’d love to hear more points of view. It’s being discussed in the WordPress Core Dev Chat this week, and you can also let me know what you think in the comments!

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Gary PendergastWordPress Importers: Defining a Schema

While schemata are usually implemented using language-specific tools (eg, XML uses XML Schema, JSON uses JSON Schema), they largely use the same concepts when talking about data. This is rather helpful, we don’t need to make a decision on data formats before we can start thinking about how the data should be arranged.

Note: Since these concepts apply equally to all data formats, I’m using “WXR” in this post as shorthand for “the structured data section of whichever file format we ultimately use”, rather than specifically referring to the existing WXR format. 🙂

Why is a Schema Important?

It’s fair to ask why, if the WordPress Importers have survived this entire time without a formal schema, why would we need one now?

There are two major reasons why we haven’t needed one in the past:

  • WXR has remained largely unchanged in the last 10 years: there have been small additions or tweaks, but nothing significant. There’s been no need to keep track of changes.
  • WXR is currently very simple, with just a handful of basic elements. In a recent experiment, I was able to implement a JavaScript-based WXR generator in just a few days, entirely by referencing the Core implementation.

These reasons are also why it would help to implement a schema for the future:

  • As work on WXR proceeds, there will likely need to be substantial changes to what data is included: adding new fields, modifying existing fields, and removing redundant fields. Tracking these changes helps ensure any WXR implementations can stay in sync.
  • These changes will result in a more complex schema: relying on the source to re-implement it will become increasingly difficult and error-prone. Following Gutenberg’s lead, it’s likely that we’d want to provide official libraries in both PHP and JavaScript: keeping them in sync is best done from a source schema, rather than having one implementation copy the other.

Taking the time to plan out a schema now gives us a solid base to work from, and it allows for future changes to happen in a reliable fashion.

WXR for all of WordPress

With a well defined schema, we can start to expand what data will be included in a WXR file.

Media

Interestingly, many of the challenges around media files are less to do with WXR, and more to do with importer capabilities. The biggest headache is retrieving the actual files, which the importer currently handles by trying to retrieve the file from the remote server, as defined in the wp:attachment_url node. In context, this behaviour is understandable: 10+ years ago, personal internet connections were too slow to be moving media around, it was better to have the servers talk to each other. It’s a useful mechanism that we should keep as a fallback, but the more reliable solution is to include the media file with the export.

Plugins and Themes

There are two parts to plugins and themes: the code, and the content. Modern WordPress sites require plugins to function, and most are customised to suit their particular theme.

For exporting the code, I wonder if a tiered solution could be applied:

  • Anything from WordPress.org would just need their slug, since they can be re-downloaded during import. Particularly as WordPress continues to move towards an auto-updated future, modified versions of plugins and themes are explicitly not supported.
  • Third party plugins and themes would be given a filter to use, where they can provide a download URL that can be included in the export file.
  • Third party plugins/themes that don’t provide a download URL would either need to be skipped, or zipped up and included in the export file.

For exporting the content, WXR already includes custom post types, but doesn’t include custom settings, or custom tables. The former should be included automatically, and the latter would likely be handled by an appropriate action for the plugin to hook into.

Settings

There are a currently handful of special settings that are exported, but (as I just noted, particularly with plugins and themes being exported) this would likely need to be expanded to included most items in wp_options.

Users

Currently, the bare minimum information about users who’ve authored a post is included in the export. This would need to be expanded to include more user information, as well as users who aren’t post authors.

WXR for parts of WordPress

The modern use case for importers isn’t just to handle a full site, but to handle keeping sites in sync. For example, most news organisations will have a staging site (or even several layers of staging!) which is synchronised to production.

While it’s well outside the scope of this project to directly handle every one of these use cases, we should be able to provide the framework for organisations to build reliable platforms on. Exports should be repeatable, objects in the export should have unique identifiers, and the importer should be able to handle any subset of WXR.

WXR Beyond WordPress

Up until this point, we’ve really been talking about WordPress→WordPress migrations, but I think WXR is a useful format beyond that. Instead of just containing direct exports of the data from particular plugins, we could also allow it to contain “types” of data. This turns WXR into an intermediary language, exports can be created from any source, and imported into WordPress.

Let’s consider an example. Say we create a tool that can export a Shopify, Wix, or GoDaddy site to WXR, how would we represent an online store in the WXR file? We don’t want to export in the format that any particular plugin would use, since a WordPress Core tool shouldn’t be advantaging one plugin over others.

Instead, it would be better if we could format the data in a platform-agnostic way, which plugins could then implement support for. As luck would have it, Schema.org provides exactly the kind of data structure we could use here. It’s been actively maintained for nearly nine years, it supports a wide variety of data types, and is intentionally platform-agnostic.

Gazing into my crystal ball for a moment, I can certainly imagine a future where plugins could implement and declare support for importing certain data types. When handling such an import (assuming one of those plugins wasn’t already installed), the WordPress Importer could offer them as options during the import process. This kind of seamless integration allows WordPress to show that it offers the same kind of fully-featured site building experience that modern CMS services do.

Of course, reality is never quite as simple as crystal balls and magic wands make them out to be. We have to contend with services that provide incomplete or fragmented exports, and there are even services that deliberately don’t provide exports at all. In the next post, I’ll be writing about why we should address this problem, and how we might be able to go about it.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Gary PendergastWordPress Importers: Getting Our House in Order

The previous post talked about the broad problems we need to tackle to bring our importers up to speed, making them available for everyone to use.

In this post, I’m going to focus on what we could do with the existing technology, in order to give us the best possible framework going forward.

A Reliable Base

Importers are an interesting technical problem. Much like you’d expect from any backup/restore code, importers need to be extremely reliable. They need to comfortable handle all sorts of unusual data, and they need to keep it all safe. Particularly considering their age, the WordPress Importers do a remarkably good job of handling most content you can throw at it.

However, modern development practices have evolved and improved since the importers were first written, and we should certainly be making use of such practices, when they fit with our requirements.

For building reliable software that we expect to largely run by itself, a variety of comprehensive automated testing is critical. This ensures we can confidently take on the broader issues, safe in the knowledge that we have a reliable base to work from.

Testing must be the first item on this list. A variety of automated testing gives us confidence that changes are safe, and that the code can continue to be maintained in the future.

Data formats must be well defined. While this is useful for ensuring data can be handled in a predictable fashion, it’s also a very clear demonstration of our commitment to data freedom.

APIs for creating or extending importers should be straightforward for hooking into.

Performance Isn’t an Optional Extra

With sites constantly growing in size (and with the export files potentially gaining a heap of extra data), we need to care about the performance of the importers.

Luckily, there’s already been some substantial work done on this front:

There are other groups in the WordPress world who’ve made performance improvements in their own tools: gathering all of that experience is a relatively quick way to bring in production-tested improvements.

The WXR Format

It’s worth talking about the WXR format itself, and determining whether it’s the best option for handling exports into the future. XML-based formats are largely viewed as a relic of days gone past, so (if we were to completely ignore backwards compatibility for a moment) is there a modern data format that would work better?

The short answer… kind of. 🙂

XML is actually well suited to this use case, and (particularly when looking at performance improvements) is the only data format for which PHP comes with a built-in streaming parser.

That said, WXR is basically an extension of the RSS format: as we add more data to the file that clearly doesn’t belong in RSS, there is likely an argument for defining an entirely WordPress-focused schema.

Alternative Formats

It’s important to consider what the priorities are for our export format, which will help guide any decision we make. So, I’d like to suggest the following priorities (in approximate priority order):

  • PHP Support: The format should be natively supported in PHP, thought it is still workable if we need to ship an additional library.
  • Performant: Particularly when looking at very large exports, it should be processed as quickly as possible, using minimal RAM.
  • Supports Binary Files: The first comments on my previous post asked about media support, we clearly should be treating it as a first-class citizen.
  • Standards Based: Is the format based on a documented standard? (Another way to ask this: are there multiple different implementations of the format? Do those implementations all function the same?
  • Backward Compatible: Can the format be used by existing tools with no changes, or minimal changes?
  • Self Descriptive: Does the format include information about what data you’re currently looking at, or do you need to refer to a schema?
  • Human Readable: Can the file be opened and read in a text editor?

Given these priorities, what are some options?

WXR (XML-based)

Either the RSS-based schema that we already use, or a custom-defined XML schema, the arguments for this format are pretty well known.

One argument that hasn’t been well covered is how there’s a definite trade-off when it comes to supporting binary files. Currently, the importer tries to scrape the media file from the original source, which is not particularly reliable. So, if we were to look at including media files in the WXR file, the best option for storing them is to base64 encode them. Unfortunately, that would have a serious effect on performance, as well as readability: adding huge base64 strings would make even the smallest exports impossible to read.

Either way, this option would be mostly backwards compatible, though some tools may require a bit of reworking if we were to substantial change the schema.

WXR (ZIP-based)

To address the issues with media files, an alternative option might be to follow the path that Microsoft Word and OpenOffice use: put the text content in an XML file, put the binary content into folders, and compress the whole thing.

This addresses the performance and binary support problems, but is initially worse for readability: if you don’t know that it’s a ZIP file, you can’t read it in a text editor. Once you unzip it, however, it does become quite readable, and has the same level of backwards compatibility as the XML-based format.

JSON

JSON could work as a replacement for XML in both of the above formats, with one additional caveat: there is no streaming JSON parser built in to PHP. There are 3rd party libraries available, but given the documented differences between JSON parsers, I would be wary about using one library to produce the JSON, and another to parse it.

This format largely wouldn’t be backwards compatible, though tools which rely on the export file being plain text (eg, command line tools to do broad search-and-replaces on the file) can be modified relatively easily.

There are additional subjective arguments (both for and against) the readability of JSON vs XML, but I’m not sure there’s anything to them beyond personal preference.

SQLite

The SQLite team wrote an interesting (indirect) argument on this topic: OpenOffice uses a ZIP-based format for storing documents, the SQLite team argued that there would be benefits (particularly around performance and reliability) for OpenOffice to switch to SQLite.

They key issues that I see are:

  • SQLite is included in PHP, but not enabled by default on Windows.
  • While the SQLite team have a strong commitment to providing long-term support, SQLite is not a standard, and the only implementation is the one provided by the SQLite team.
  • This option is not backwards compatible at all.

FlatBuffers

FlatBuffers is an interesting comparison, since it’s a data format focussed entirely on speed. The down side of this focus is that it requires a defined schema to read the data. Much like SQLite, the only standard for FlatBuffers is the implementation. Unlike SQLite, FlatBuffers has made no commitments to providing long-term support.

WXR (XML-based)WXR (ZIP-based)JSONSQLiteFlatBuffers
Works in PHP?✅✅⚠⚠⚠
Performant?⚠✅⚠✅✅
Supports Binary Files?⚠✅⚠✅✅
Standards Based?✅✅✅⚠ / ��
Backwards Compatible?⚠⚠���
Self Descriptive?✅✅✅✅�
Readable?✅⚠ / �✅��

As with any decision, this is a matter of trade-offs. I’m certainly interested in hearing additional perspectives on these options, or thoughts on options that I haven’t considered.

Regardless of which particular format we choose for storing WordPress exports, every format should have (or in the case of FlatBuffers, requires) a schema. We can talk about schemata without going into implementation details, so I’ll be writing about that in the next post.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

Gary PendergastWordPress Importers: Stating the Problem

It’s time to focus on the WordPress Importers.

I’m not talking about tidying them up, or improve performance, or fixing some bugs, though these are certainly things that should happen. Instead, we need to consider their purpose, how they fit as a driver of WordPress’ commitment to Open Source, and how they can be a key element in helping to keep the Internet Open and Free.

The History

The WordPress Importers are arguably the key driver to WordPress’ early success. Before the importer plugins existed (before WordPress even supported plugins!) there were a handful of import-*.php scripts in the wp-admin directory that could be used to import blogs from other blogging platforms. When other platforms fell out of favour, WordPress already had an importer ready for people to move their site over. One of the most notable instances was in 2004, when Moveable Type changed their license and prices, suddenly requiring personal blog authors to pay for something that had previously been free. WordPress was fortunate enough to be in the right place at the right time: many of WordPress’ earliest users came from Moveable Type.

As time went on, WordPress became well known in its own right. Growth relied less on people wanting to switch from another provider, and more on people choosing to start their site with WordPress. For practical reasons, the importers were moved out of WordPress Core, and into their own plugins. Since then, they’ve largely been in maintenance mode: bugs are fixed when they come up, but since export formats rarely change, they’ve just continued to work for all these years.

An unfortunate side effect of this, however, is that new importers are rarely written. While a new breed of services have sprung up over the years, the WordPress importers haven’t kept up.

The New Services

There are many new CMS services that have cropped up in recent years, and we don’t have importers for any of them. WordPress.com has a few extra ones written, but they’ve been built on the WordPress.com infrastructure out of necessity.

You see, we’ve always assumed that other CMSes will provide some sort of export file that we can use to import into WordPress. That isn’t always the case, however. Some services (notable, Wix and GoDaddy Website Builder) deliberately don’t allow you to export your own content. Other services provide incomplete or fragmented exports, needlessly forcing stress upon site owners who want to use their own content outside of that service.

To work around this, WordPress.com has implemented importers that effectively scrape the site: while this has worked to some degree, it does require regular maintenance, and the importer has to do a lot of guessing about how the content should be transformed. This is clearly not a solution that would be maintainable as a plugin.

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

This strikes at the heart of the WordPress Bill of Rights. WordPress is built with fundamental freedoms in mind: all of those freedoms point to owning your content, and being able to make use of it in any form you like. When a CMS actively works against providing such freedom to their community, I would argue that we have an obligation to help that community out.

A Variety of Content

It’s worth discussing how, when starting a modern CMS service, the bar for success is very high. You can’t get away with just providing a basic CMS: you need to provide all the options. Blogs, eCommerce, mailing lists, forums, themes, polls, statistics, contact forms, integrations, embeds, the list goes on. The closest comparison to modern CMS services is… the entire WordPress ecosystem: built on WordPress core, but with the myriad of plugins and themes available, along with the variety of services offered by a huge array of companies.

So, when we talk about the importers, we need to consider how they’ll be used.

Problem Number 3

To import from a modern CMS service into WordPress, your importer needs to map from service features to WordPress plugins.

Getting Our Own House In Order

Some of these problems don’t just apply to new services, however.

Out of the box, WordPress exports to WXR (WordPress eXtended RSS) files: an XML file that contains the content of the site. Back when WXR was first created, this was all you really needed, but much like the rest of the WordPress importers, it hasn’t kept up with the times. A modern WordPress site isn’t just the sum of its content: a WordPress site has plugins and themes. It has various options configured, it has huge quantities of media, it has masses of text content, far more than the first WordPress sites ever had.

Problem Number 2

WXR doesn’t contain a full export of a WordPress site.

In my view, WXR is a solid format for handling exports. An XML-based system is quite capable of containing all forms of content, so it’s reasonable that we could expand the WXR format to contain the entire site.

Built for the Future

If there’s one thing we can learn from the history of the WordPress importers, it’s that maintenance will potentially be sporadic. Importers are unlikely to receive the same attention that the broader WordPress Core project does, owners may come and go. An importer will get attention if it breaks, of course, but it otherwise may go months or years without changing.

Problem Number 1

We can’t depend on regular importer maintenance in the future.

It’s quite possible to build code that will be running in 10+ years: we see examples all across the WordPress ecosystem. Doing it in a reliable fashion needs to be a deliberate choice, however.

What’s Next?

Having worked our way down from the larger philosophical reasons for the importers, to some of the more technically-oriented implementation problems; I’d like to work our way back out again, focussing on each problem individually. In the following posts, I’ll start laying out how I think we can bring our importers up to speed, prepare them for the future, and make them available for everyone.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Linux AustraliaCouncil Meeting Tuesday 12th January 2021 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

 

Apologies 

Benno Rice

 

Meeting opened at 1931 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • From Anchor to council@ on 16 Dec 2020: opensource.org.au is due for renewal by 15 Feb 2021.
  • From Big Orange Heart via web form: request for sponsorship of WordFest (https://wordfest.live) on 22 Jan 2021.
    [Response has been sent indicating Council in caretaker mode, suggesting a longer lead time in future and outlining LA’s funding priorities.]
  • From Anchor to council@ on 27 Dec 2020: lca2018.org requires renewal by 26 Feb 2021.
    [As per meeting on 15 Dec 2020, this will be allowed to expire if the few remaining services dependent on the domain have been moved by 26 Feb 2021.]
  • From <a member> via website form on 27 Dec 2020: will LA write a submission into the Productivity Commission “Right to Repair” enquiry? A response was sent:
    • The tight timing (closing date in early Feb) and existing commitments of the Council members will almost certainly prevent this happening unless a member steps up to do it.
    • If a member wished LA to sign a submission they prepared and it was consistent with LA’s values, Council would consider signing it.

<The member> is writing a submission.  He has subsequently indicated he won’t request LA sign it as it contains a lot of personal perspective and he doesn’t feel he can speak for others. He may share it on linux-aus for comments.

  • From Binh Hguyen on Grants list, asking about out-of-cycle funding for a project idea (a public information aggregator). Response sent indicating Council is in caretaker mode and suggesting that they wait for the 2021 Grant program if it eventuates.

 

3. Items for discussion

  • AGM –  Are we ready, who is doing what etc. 
    • Roles required: 
      • Chat Monitor
      • Votes/Poll Monitor
      • Hands Up – Reaction Monitor
      • Timer
  • Bringing together the nomination sheets, these need to be collated, distributed and a seconder needs to be found for each of the nominations.
  • Annual Report – what is missing what do we need to send out etc.
    • Hoping for reports from web team, pycon.
    • Once those are received the report can be sent out.
  • Rusty Wrench
    • Decision is to award to <see announcement>, Jon or one of the other formers will award
    • AI: Sae Ra to liaise with Jon
  • JW: Russell Coker has set up a replacement for planet.linux.org.au at https://planet.luv.asn.au/. It has been suggested that planet.linux.org.au either point to this URL or contain a link to it.
    • AI: Jonathan & Julien to action link on current static page
  • JW: <A member> suggests via council@ on 5 Jan 2021 that the “Should a digest be dispatched daily when the size threshold isn’t reached” setting for la-announce be set to “Yes”. Due to the low traffic volume on this list, users with digest mode set may only receive messages after a delay of many months.
    • All agree, however it turns out the setting has already been enabled.

4. Items for noting

  • None

5. Other business

  • None

6. In camera

  • No items were discussed in camera

Meeting closed at 2024

The post Council Meeting Tuesday 12th January 2021 – Minutes appeared first on Linux Australia.

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

A Quick Recap

When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

Integration in OpenHMD

When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

Headset motion capture trace

If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

Headset motion capture – zoomed in view

This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

Losing Track

The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

That looks more like this

Headset motion capture trace with tracking errors

This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

Latency Compensation

There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

Here’s a good example of the difference:

Before: Position filtering with no latency compensation
After: Latency-compensated position reports

Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

There are some other possible approaches to this problem:

  • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
  • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
  • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
  • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

Controllers

So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

Oculus Camera view of headset and left controller.

This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

Position tracking of left controller with lots of tracking loss.

Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

Timing Improvements

One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

Sponsorships

All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

Next Steps

The next things on my todo list are:

  • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
  • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
  • Publish the UKF branch for other people to try.
  • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
  • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

,

Tim SerongScope Creep

On December 22, I decided to brew an oatmeal stout (5kg Gladfield ale malt, 250g dark chocolate malt, 250g light chocolate malt, 250g dark crystal malt, 500g rolled oats, 150g rice hulls to stop the mash sticking, 25g Pride of Ringwood hops, Safale US-05 yeast). This all takes a good few hours to do the mash and the boil and everything, so while that was underway I thought it’d be a good opportunity to remove a crappy old cupboard from the laundry, so I could put our nice Miele upright freezer in there, where it’d be closer to the kitchen (the freezer is presently in a room at the other end of the house).

The cupboard was reasonably easy to rip out, but behind it was a mouldy and unexpectedly bright yellow wall with an ugly gap at the top where whoever installed it had removed the existing cornice.

Underneath the bottom half of the cupboard, I discovered not the cork tiles which cover the rest of the floor, but a layer of horrific faux-tile linoleum. Plus, more mould. No way was I going to put the freezer on top of that.

So, up came the floor covering, back to nice hardwood boards.

Of course, the sink had to come out too, to remove the flooring from under its cabinet, and that meant pulling the splashback tiles (they had ugly screw holes in them anyway from a shelf that had been bracketed up on top of them previously).

Removing the tiles meant replacing a couple of sections of wall.

Also, we still needed to be able to use the washing machine through all this, so I knocked up a temporary sink support.

New cornice went in.

The rest of the plastering was completed and a ceiling fan installed.

Waterproofing membrane was applied where new tiles will go around a new sink.

I removed the hideous old aluminium backed weather stripping from around the exterior door and plastered up the exposed groove.

We still need to paint everything, get the new sink installed, do the tiling work and install new taps.

As for the oatmeal stout, I bottled that on January 2. From a sample taken at the time, it should be excellent, but right now still needs to carbonate and mature.

Stewart SmithPhotos from Taiwan

A few years ago we went to Taiwan. I managed to capture some random bits of the city on film (and also some shots on my then phone, a Google Pixel). I find the different style of art on the streets around the world to be fascinating, and Taiwan had some good examples.

I’ve really enjoyed shooting Kodak E100VS film over the years, and some of my last rolls were shot in Taiwan. It’s a film that unfortunately is not made anymore, but at least we have a new Ektachrome to have fun with now.

Words for our time: “Where there is democracy, equality and freedom can exist; without democracy, equality and freedom are merely empty words”.

This is, of course, only a small number of the total photos I took there. I’d really recommend a trip to Taiwan, and I look forward to going back there some day.

,

Simon LyallAudiobooks – December 2020

The Perils of Perception: Why We’re Wrong About Nearly Everything by Bobby Duffy

Lots of examples of how people are wrong about usually crime rates or levels of immigration. Divided into topics with some comments on why and how to fix. 3/5

The Knowledge: How to Rebuild our World from Scratch
by Lewis Dartnell

A how-to on rebooting civilization following a worldwide disaster. The tone is addressed to a present-day person rather than someone from the future which makes it more readable. 4/5

The Story of Silver: How the White Metal Shaped America and the Modern World by William L. Silber

Almost solely devoted to America it devotes sections to major events around the metal including it’s demonetization, government and private price manipulation and speculation including the Hunt Brothers. 3/5

The First Four Years by Laura Ingalls Wilder

About half the length of the other books in the series and published posthumously. Laura and Almanzo try to make a success farming for 4 years. Things don’t go well. The book is a bit more adult than some of the others 3/5

Casino Royale by Ian Fleming

Interesting how close it is to the 2006 Movie. Also since it is set in ~1951, World War 2 looms large in many places & most characters are veterans. Very good and fairly quick read. 4/5

A Bridge too far: The Classic History of the Greatest Battle of World War II by Cornelius Ryan

An account of the failed airborne operation. Mostly a day-by-day & sources including interviews with participants. A little confusing without maps. 4/5

The Bomb: Presidents, Generals, and the Secret History of Nuclear War by Fred Kaplan

“The definitive history of American policy on nuclear war”. Lots of “War Plans” and “Targeting Policy” with back and forth between service factions. 3/5

The Sirens of Mars: Searching for Life on Another World
by Sarah Stewart Johnson

“Combines elements of memoir from Johnson with the history and science of attempts to discover life on Mars”. I liked this book a lot, very nicely written and inspiring. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Simon LyallDonations 2020

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

I also blog about it to hopefully inspire others. See: 2019, 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

My main donations was $750 to Givewell (to allocate to projects as they prioritize). Once again I’m happy that Givewell make efficient use of money donated. I decided this year to give a higher proportion of my giving to them than last year.

Software and Internet Infrastructure Projects

€20 to Syncthing which I’ve started to use instead of Dropbox.

$50 each to the Software Freedom Conservancy and Software in the Public Interest . Money not attached to any specific project.

$51 to the Internet Archive

$25 to Let’s Encrypt

Advocacy Organisations

$50 to the Electronic Frontier Foundation

Others including content creators

I donated $103 to Signum University to cover Corey Olsen’s Exploring the Lord of the Rings series plus other stuff I listen to that they put out.

I paid $100 to be a supporter of NZ News site The Spinoff

I also supported a number of creators on Patreon:

Share

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering

In my last post I wrote about changes in my OpenHMD positional tracking branch to split analysis of the tracking frames from the camera sensors across multiple threads. In the 2 months since then, the only real change in the repository was to add some filtering to the pose search that rejects bad poses by checking if they align with the gravity vector observed by the IMU. That is in itself a nice improvement, but there is other work I’ve been doing that isn’t published yet.

The remaining big challenge (I think) to a usable positional tracking solution is fusing together the motion information that comes from the inertial tracking sensors (IMU) in the devices (headset, controllers) with the observations that come from the camera sensors (video frames). There are some high level goals for that fusion, and lots of fiddly details about why it’s hard.

At the high level, the IMUs provide partial information about the motion of each device at a high rate, while the cameras provide observations about the actual position in the room – but at a lower rate, and with sometimes large delays.

In the Oculus CV1, the IMU provides Accelerometer and Gyroscope readings at 1000Hz (500Hz for controllers), and from those it’s possible to compute the orientation of the device relative to the Earth (but not the compass direction it’s facing), and also to integrate acceleration readings to get velocity and position – but the position tracking from an IMU is only useful in the short term (a few seconds) as it drifts rapidly due to that double integration.

The accelerometers measure (surprise) the acceleration of the device, but are always also sensing the Earth’s gravity field. If a device is at rest, it will ideally report 9.81 m/s2, give or take noise and bias errors. When the device is in motion, the acceleration measured is the sum of the gravity field, bias errors and actual linear acceleration. To interpolate the position with any accuracy at all, you need to separate those 3 components with tight tolerance.

That’s about the point where the position observations from the cameras come into play. You can use those snapshots of the device position to determine the real direction that the devices are facing, and to correct for any errors in the tracked position and device orientation from the IMU integration – by teasing out the bias errors and gravity offset.

The current code uses some simple hacks to do the positional tracking – using the existing OpenHMD 3DOF complementary filter to compute the orientation, and some hacks to update the position when a camera finds the pose of a device.

The simple hacks work surprisingly well when devices don’t move too fast. The reason is (as previously discussed) that the video analysis takes a variable amount of time – if we can predict where a device is with a high accuracy and maintain “tracking lock”, then the video analysis is fast and runs in a few milliseconds. If tracking lock is lost, then a full search is needed to recover the tracking, and that can take hundreds of milliseconds to complete… by which time the device has likely moved a long way and requires another full pose search, which takes hundreds of milliseconds..

So, the goal of my current development is to write a single unified fusion filter that combines IMU and camera observations to better track and predict the motion of devices between camera frames. Better motion prediction means hitting the ‘fast analysis’ path more often, which leads to more frequent corrections of the unknowns in the IMU data, and (circularly) better motion predictions.

To do that, I am working on an Unscented Kalman Filter that tracks the position, velocity, acceleration, orientation and IMU accelerometer and gyroscope biases – with promising initial results.

Graph of position error (m) between predicted position and position from camera observations
Graph of orientation error (degrees) between predicted orientation and camera observed pose.

In the above graphs, the filter is predicting the position of the headset at each camera frame to within 1cm most of the time and the pose to within a few degrees, but with some significant spikes that still need fixing. The explanation for the spikes lies in the data sets that I’m testing against, and points to the next work that needs doing.

To develop the filter, I’ve modifed OpenHMD to record traces as I move devices around. It saves out a JSON file for each device with a log of each IMU reading and each camera frame. The idea is to have a baseline data set that can be used to test each change in the filter – but there is a catch. The current data was captured using the upstream positional tracking code – complete with tracking losses and long analysis delays.

The spikes in the filter graph correspond with when the OpenHMD traces have big delays between when a camera frame was captured and when the analysis completes.

Delay (ms) between camera frame and analysis results.

What this means is that when the filter makes bad predictions, it’s because it’s trying to predict the position of the device at the time the sensor result became available, instead of when the camera frame was captured – hundreds of milliseconds earlier.

So, my next step is to integrate the Kalman filter code into OpenHMD itself, and hopefully capture a new set of motion data with fewer tracking losses to prove the filter’s accuracy more clearly.

Second – I need to extend the filter to compensate for that delay between when a camera frame is captured and when the results are available for use, by using an augmented state matrix and lagged covariances. More on that next time.

To finish up, here’s a taste of another challenge hidden in the data – variability in the arrival time of IMU updates. The IMU updates at 1000Hz – ideally we’d receive those IMU updates 1 per millisecond, but transfer across the USB and variability in scheduling on the host computer make that much noisier. Sometimes further apart, sometimes bunched together – and in one part there’s a 1.2 second gap.

IMU reading timing variability (nanoseconds)

,

Tim SerongI Have No Idea How To Debug This

On my desktop system, I’m running XFCE on openSUSE Tumbleweed. When I leave my desk, I hit the “lock screen” button, the screen goes black, and the monitors go into standby. So far so good. When I come back and mash the keyboard, everything lights up again, the screens go white, and it says:

blank: Shows nothing but a black screen
Name: tserong@HOSTNAME
Password:
Enter password to unlock; select icon to lock

So I type my password, hit ENTER, and I’m back in action. So far so good again. Except… Several times recently, when I’ve come back and mashed the keyboard, the white overlay is gone. I can see all my open windows, my mail client, web browser, terminals, everything, but the screen is still locked. If I type my password and hit ENTER, it unlocks and I can interact again, but this is where it gets really weird. All the windows have moved down a bit on the screen. For example, a terminal that was previously neatly positioned towards the bottom of the screen is now partially off the screen. So “something” crashed – whatever overlay the lock thingy put there is gone? And somehow this affected the position of all my application windows? What in the name of all that is good and holy is going on here?

Update 2020-12-21: I’ve opened boo#1180241 to track this.

,

Linux AustraliaCouncil Meeting Tuesday 15th December 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies 

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • From Anchor to council@ on 2 Dec 2020: lca2018.org domain renewal is due by 1 Mar 2021.
    • Still a few services that either need to move off.
    • AI: Joel to deal with.
  • From ASIC to council@ on 8 Dec 2020: “Open Source Australia” business registration due by 24 Nov 2020. This was dealt with on the day; an acknowledgement of renewal was received.
    • (Already done)

3. Items for discussion

  • Rusty Wrench
    • AI: Julien to send out to previous winners, without a suggestion from council
  • Code of Conduct wording clarification (background sent to council@)
    • Also the github copy turns out to be out of date, PyCon desire this refresh to happen early enough to be in place for the next PyCon.
    • Change will be announced.
    • AI: Jonathan to write up an announcement for policies, linux-aus,.
  • YouTube Partner Program
    • Setup more formal escalation paths for LCA (and other LA-affiliated events) in future
    • Register for YouTube Partner Programme to get additional support avenues.
    • Need to create an AdSense account for this
    • Do not need to enable monetisation now or future, but do need to decide whether existing videos should have monetisation enabled when we join the program.
    • Sae Ra moves motion that we join the partner program, but do not enable monetisation for existing videos.
      • Passed, one abstention.
    • AI: Joel to register, and update Ryan.

4. Items for noting

  • Election/AGM announcement sent.
    • Need some volunteers for AGM meeting wrangling.
    • AGM is set for: 11am-midday on Friday the 15th of January (AEDT)
      • 8am Perth
    • AI: Julien to send call for AGM items ASAP.
  • LCA update <details redacted>

5. Other business

  • None

6. In camera

  • No items were discussed in camera

Meeting closed at 2020

The post Council Meeting Tuesday 15th December 2020 – Minutes appeared first on Linux Australia.

Stewart SmithTwo Photos from Healseville Sanctuary

If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

Leah trying to photograph a much too close bird
Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

Stewart SmithPhotos from Adelaide

Some shots on Kodak Portra 400 from Adelaide. These would have been shot with my Nikon F80 35mm body, I think all with the 50mm lens. These are all pre-pandemic, and I haven’t gone and looked up when exactly. I’m just catching up on scanning some negatives.

,

Linux AustraliaCouncil Meeting Tuesday 1st December 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • 23 Nov 2020: Mailchimp policy update notification sent to council@.
    • N/A as the account has been closed
    • Data export saved into our Google Drive
  • From: Xero; Date: Mon 30 Nov 2020, Subject: Xero pricing changes, Summary: Price is going up by $2/mo.

4. Items for discussion

  • AGM timing
    • We’ll continue as planned, Julien & Sae Ra to ensure the announcements go out.
  • Do we have a meeting on the 29th of December, if not 12th Jan is normal (just before AGM)
    • No to the end of December, yes to the January
  • LCA YouTube account (Joel)
    • Setup more formal escalation paths for LCA (and other LA-affiliated events) in future
    • Register for YouTube Partner Programme to get additional support avenues.
    • Need to create an AdSense account for this
    • Do not need to enable monetisation now or future, but do need to decide whether existing videos should have monetisation enabled when we join the program.
    • AI: Sae Ra will move this on list so people have time to review

5. Items for noting

  • Rusty wrench nom period ongoing
    • 2 proper nominations received
    • 1 resubmission of a nomination from last year requested
    • 1 apparently coming
  • Jonathan reached out to <a member> re Code of Conduct concerns, haven’t gotten a response
  • Grant application for Software Freedom Day, no response from them, so grant has lapsed.
  • <our contact on the CovidSafe analysis team> has yet to provide information about the FOI costs associated with his group’s work on the COVIDsafe app.

6. Other business 

  • None

7. In camera

  • No items were discussed in camera

2038 AEDT close

The post Council Meeting Tuesday 1st December 2020 – Minutes appeared first on Linux Australia.

,

Stewart SmithWhy you should use `nproc` and not grep /proc/cpuinfo

There’s something really quite subtle about how the nproc utility from GNU coreutils works. If you look at the man page, it’s even the very first sentence:

Print the number of processing units available to the current process, which may be less than the number of online processors.

So, what does that actually mean? Well, just because the computer some code is running on has a certain number of CPUs (and here I mean “number of hardware threads”) doesn’t necessarily mean that you can spawn a process that uses that many. What’s a simple example? Containers! Did you know that when you invoke docker to run a container, you can easily limit how much CPU the container can use? In this case, we’re looking at the --cpuset-cpus parameter, as the --cpus one works differently.

$ nproc
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# nproc
2
bash-4.2# exit

$ docker run --cpuset-cpus=0-2 --rm=true -it  amazonlinux:2
bash-4.2# nproc
3

As you can see, nproc here gets the right bit of information, so if you’re wanting to do a calculation such as “Please use up to the maximum available CPUs” as a parameter to the configuration of a piece of software (such as how many threads to run), you get the right number.

But what if you use some of the other common methods?

$ /usr/bin/lscpu -p | grep -c "^[0-9]"
8
$ grep -c 'processor' /proc/cpuinfo 
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# yum install -y /usr/bin/lscpu
......
bash-4.2# /usr/bin/lscpu -p | grep -c "^[0-9]"
8
bash-4.2# grep -c 'processor' /proc/cpuinfo 
8
bash-4.2# nproc
2

In this case, if you base your number of threads off grepping lscpu you take another dependency (on the util-linux package), which isn’t needed. You also get the wrong answer, as you do by grepping /proc/cpuinfo. So, what this will end up doing is just increase the number of context switches, possibly also adding a performance degradation. It’s not just in docker containers where this could be an issue of course, you can use the same mechanism that docker uses anywhere you want to control resources of a process.

Another subtle thing to watch out for is differences in /proc/cpuinfo content depending on CPU architecture. You may not think it’s an issue today, but who wants to needlessly debug something?

tl;dr: for determining “how many processes to run”: use nproc, don’t grep lscpu or /proc/cpuinfo

,

Stewart SmithPhotos from Tasmania (2017)

On the random old photos train, there’s some from spending time in Tasmania post linux.conf.au 2017 in Hobart.

All of these are Kodak E100VS film, which was no doubt a bit out of date by the time I shot it (and when they stopped making Ektachrome for a while). It was a nice surprise to be reminded of a truly wonderful Tassie trip, taken with friends, and after the excellent linux.conf.au.

Linux AustraliaCouncil Meeting Tuesday 17th November 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Benno Rice

Joel Addison

Apologies 

None

 

Meeting opened at 1932 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • None

3. Items for discussion

  • Rusty Wrench timing.
    • Call for nominations draft looks good, (AI) Julien to send ASAP
  • Vala Tech Camp diversity scholarship
    • See mail to council@ 2020-11-09 from Sae Ra, “[LACTTE] For Next Meeting – VALA Tech Camp Diversity Scholarship”
    • Motion by Julien for Linux Australia to sponsor VALA Tech Camp A$2,750, seconded by Jonathan.
    • Passed, one abstention.

4. Items for noting

  • LCA21 <details redacted>

5. Other business

  • Quick discussion re covidsafe research and freedom of information requests, nothing for now, possibly next time
  • Moving the returning officer video into a doc would be nice, but no short term volunteers. Jonathan may look into it early 2021.

6. In camera

  • No items were discussed in camera

 

Meeting closed at 2006

The post Council Meeting Tuesday 17th November 2020 – Minutes appeared first on Linux Australia.

,

Glen TurnerBlocking a USB device

udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"


comment count unavailable comments

,

Linux AustraliaCouncil Meeting Tuesday 3rd November 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • From: ASIC; Date: Sun, 25 Oct 2020 19:16:44 +1100; Subject: Renewal: OPEN SOURCE AUSTRALIA
    • MOTION: Russell Stuart moves we pay ASIC AUD$87 to renew “OPEN SOURCE AUSTRALIA” for 3 years.
    • Seconder: Jonathan
    • Outcome: Passed
  • From Google to council@ on 27 Oct 2020. Use of Google App Engine beyond 31 Jan 2021 requires payment information be added to the linked account before 31 Jan 2021. Affected project is “sydney-linux-users-group-hr”.
    • Current SLUG site is hosted on LA infra, we’ll leave it and see if anything breaks.
  • From MailChimp to council@ on 28 Oct 2020. Policies have been updated (Standard Terms of Use, Data Processing Addendum).
    • AI: Sae Ra to close account, was used in migration onto CiviCRM
  • From AgileWare; Subject: New Invoice, due 30/11/2020; Date: Sat, 31 Oct 2020 10:00:27 +1100.  Summary: $330 renewal for 6 months hosting.
    • MOTION: Sae Ra moves LA pays AgileWare AUD$330 to for 6 months in advance web site hosting, and up to AUD$3000 for support renewal.
    • Seconder: Russell
    • Outcome:  Passed

4. Items for discussion

  • None

5. Items for noting

  • Stewart Smith has agreed to be returning officer.
  • Sae Ra needs photo & bio for council members for annual report.
  • Audit need bank statements, some are only just through as of this meeting.
  • Approached by Vala Tech Camp to sponsor for next year.

6. Other business 

  • Call for nominations for Rusty Wrench
    • Announce out by late November, 2-3 week nomination period, close mid-December
    • AI: Julien to update draft

7. In camera

  • No items were discussed in camera

2011 AEDT close

The post Council Meeting Tuesday 3rd November 2020 – Minutes appeared first on Linux Australia.

,

Linux AustraliaCouncil Meeting Tuesday 20th October 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Apologies 

Benno Rice

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • 1 Oct 2020 to @council: Thanks received from NetThing for LA’s support of their 2020 event.

3. Items for discussion

  • Grant application received from Pauline Clague: Online indigenous women’s possum skin cloak making workshop. Application received on 25 Sep 2020, community consultation closed Fri 9 Oct 2020. To be considered by Council on 20 Oct 2020.
    • MOTION BY Sae Ra That Linux Australia Accepts the Grant Proposal Online indigenous women’s possum skin cloak making workshop submitted by Pauline Clague.
    • Seconded: Julien
    • Motion failed.
    • AI: Jonathan to follow up
  • AGM Discussion
    • Agreement on the 15th.
    • Suggestion for Stewart Smith for returning officer
      • AI: Julien to approach

4. Items for noting

  • LCA 2021 <details redacted>
  • Jonathan talked to the Rockhampton Art Gallery folk about their proposal, resources created will be open, may come back.

5. Other business 

  • AI: Julien to provide redacted minutes for audit
  • AI: Julien to create static copy of planet site so admin team can just switch off

6. In camera

  • No items were discussed in camera

 

Meeting closed at 2000

The post Council Meeting Tuesday 20th October 2020 – Minutes appeared first on Linux Australia.

,

Jan SchmidtRift CV1 – multi-threaded tracking

This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.

If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.

As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.

There are some obvious next things to work on:

  • It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
  • The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
  • The code that decides whether an LED is expected to be visible in a given pose can use some improving.
  • Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
  • Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.

Code is at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

,

Linux AustraliaSaying Farewell to Planet Linux Australia

Planet Linux Australia (planet.linux.org.au) was started more than 15 years ago by Michael Davies. In the time since (and particularly before the rise of social media), it has provided a valuable service by encouraging the sharing of information and opinions within our Open Source community. However, due to the many diverse communication options now available over the internet, sites such as Planet Linux Australia are no longer used as heavily as they once were. With many other channels now available, the resources required to maintain Planet Linux Australia are becoming difficult to justify.

With this in mind and following the recommendation of Michael Davies, the Linux Australia Council has decided that it is time close Planet Linux Australia. Linux Australia would like to express its profound appreciation for the work Michael and others have done to initiate and maintain this service. Our community has greatly benefited from this service over the years.

The post Saying Farewell to Planet Linux Australia appeared first on Linux Australia.

,

Jan SchmidtRift CV1 update

This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD

In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.

Here’s the updated list:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed. Partially Fixed
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test. Much Improved
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now. Fixed
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:

  • 1 thread shared between all sensors to capture USB packets and assemble them into frames
  • 1 thread per sensor to analyse the frame and update poses

The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.

I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.

I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.

Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.

Some other bits:

  • Fixing numerical overflow issues in the OpenHMD maths routines
  • Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
  • Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
  • A 1st order estimate of device velocity to help improve the next predicted position

I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.

As previously, the code is available at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

Linux AustraliaCouncil Meeting Tuesday 6th October 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Benno Rice

Apologies

Joel Addison

 

Meeting opened at 1931 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • NSW Government Small Business Survey: reminder received via council@ on 23 Sep 2020. Original message received on 21 Sep 2020.
    • We’re not going to complete this survey.
  • Ampache grant progress report 2: received via council@ on 23 Sep 2020. 
    • Not fully final, but we do expect one more.
  • Grant application received from Pauline Clague: Online indigenous women’s possum skin cloak making workshop. Application received on 25 Sep 2020, community consultation closes Fri 9 Oct 2020. To be considered by Council on 20 Oct 2020.

4. Items for discussion

  • AGM
    • AI: Sae Ra will set up time with Julien to plan AGM, possibly next week
    • Annual report has been started
      • Will need photo / bio from people
      • AI: Julien to get minutes posted

5. Items for noting

  • newCardigan AGM (glam tech group), running using our Zoom account
  • Netthing happened
    • We got praised!
  • Software Freedom Day

6. Other business 

  • None

7. In camera

  • One item was discussed in camera.

2042 AEDT close

The post Council Meeting Tuesday 6th October 2020 – Minutes appeared first on Linux Australia.

,

Gary PendergastMore than 280 characters

It’s hard to be nuanced in 280 characters.

The Twitter character limit is a major factor of what can make it so much fun to use: you can read, publish, and interact, in extremely short, digestible chunks. But, it doesn’t fit every topic, ever time. Sometimes you want to talk about complex topics, having honest, thoughtful discussions. In an environment that encourages hot takes, however, it’s often easier to just avoid having those discussions. I can’t blame people for doing that, either: I find myself taking extended breaks from Twitter, as it can easily become overwhelming.

For me, the exception is Twitter threads.

Twitter threads encourage nuance and creativity.

Creative masterpieces like this Choose Your Own Adventure are not just possible, they rely on Twitter threads being the way they are.

Publishing a short essay about your experiences in your job can bring attention to inequality.

And Tumblr screenshot threads are always fun to read, even when they take a turn for the epic (over 4000 tweets in this thread, and it isn’t slowing down!)

Everyone can think of threads that they’ve loved reading.

My point is, threads are wildly underused on Twitter. I think I big part of that is the UI for writing threads: while it’s suited to writing a thread as a series of related tweet-sized chunks, it doesn’t lend itself to writing, revising, and editing anything more complex.

To help make this easier, I’ve been working on a tool that will help you publish an entire post to Twitter from your WordPress site, as a thread. It takes care of transforming your post into Twitter-friendly content, you can just… write. 🙂

It doesn’t just handle the tweet embeds from earlier in the thread: it handles handle uploading and attaching any images and videos you’ve included in your post.

All sorts of embeds work, too. 😉

It’ll be coming in Jetpack 9.0 (due out October 6), but you can try it now in the latest Jetpack Beta! Check it out and tell me what you think. 🙂

This might not fix all of Twitter’s problems, but I hope it’ll help you enjoy reading and writing on Twitter a little more. 💖

,

Glen TurnerConverting MPEG-TS to, well, MPEG

Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


comment count unavailable comments

,

Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

@apply

def apply(transform):

    def __decorator__(using_this):
        return transform(using_this)

    return __decorator__


numbers = [1, 2, 3, 4, 5]

@apply(lambda f: list(map(f, numbers)))
def squares(i):
  return i * i

print(list(squares))

# prints: [1, 4, 9, 16, 25]

Init.java

public class Init {
  public static void main(String[] args) {
    System.out.println("Hello, World!")
  }
}

@switch and @case

__NOT_A_MATCHER__ = object()
__MATCHER_SORT_KEY__ = 0

def switch(cls):

    inst = cls()
    methods = []

    for attr in dir(inst):
        method = getattr(inst, attr)
        matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)

        if matcher == __NOT_A_MATCHER__:
            continue

        methods.append(method)

    methods.sort(key = lambda i: i.__matcher_sort_key__)

    for method in methods:
        matches = method.__matcher__()
        if matches:
            return method()

    raise ValueError(f"No matcher matches value {test_value}")

def case(matcher):

    def __decorator__(f):
        global __MATCHER_SORT_KEY__

        f.__matcher__ = matcher
        f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
        __MATCHER_SORT_KEY__ += 1
        return f

    return __decorator__



if __name__ == "__main__":
    for i in range(100):

        @switch
        class FizzBuzz:

            @case(lambda: i % 15 == 0)
            def fizzbuzz(self):
                return "fizzbuzz"

            @case(lambda: i % 3 == 0)
            def fizz(self):
                return "fizz"

            @case(lambda: i % 5 == 0)
            def buzz(self):
                return "buzz"

            @case(lambda: True)
            def default(self):
                return "-"

        print(f"{i} {FizzBuzz}")

,

Craig SandersFuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

,

Rusty Russell57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

,

Matt PalmerPrivate Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

,

Jonathan Adamczewskif32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

,

Chris NeugebauerReflecting on 10 years of not having to update WordPress

Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

How to migrate a Debian WordPress site to Jekyll

Should you find the Jekyll exporter not working on your Debian WordPress install:

  1. Use the standard WordPress export to export an XML feel of your site.
  2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
  3. Import the exported XML feed.
  4. Install the Jekyll exporter plugin.
  5. Follow the documentation and receive a Jekyll export of your site.

Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

,

Gary PendergastInstall the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

,

Andrew RuthvenInstall Fedora CoreOS using FAI

I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

A few interesting aspects I encountered:

  1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
  2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
  3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
  4. The time to do an installation was incredibly long.
  5. The initrd image is really just running coreos-installer against the nominated device.

During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

,

Gary PendergastBebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

,

Robert CollinsStrength training from home

For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

I’ve gotten most of my information from a few different youtube channels:

There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

Good luck, train well!

OpenSTEMOnline Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

The post Online Teaching first appeared on OpenSTEM Pty Ltd.

Brendan ScottCovid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

,

OpenSTEMCOVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

The post COVID-19 (of course) first appeared on OpenSTEM Pty Ltd.

,

Clinton Roylca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

,

sthbrx - a POWER technical bloglinux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

,

Robert Collins2019 in the rearview

2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.

We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.

When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.

At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.

The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).

I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.

Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.

Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.

We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).

I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.

I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.

My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.

In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol squats, push ups, dead hangs and so on. My upper body strength needs to come along some before we can really go places though… and we’re probably going to max out the hamstring curl machine (at least for regular two-leg curls) before my core is strong enough to do a Nordic drop.

Lynne has been worried about injuring herself with weight lifting at the gym for some time now, but recently saw my physio – Ben Cameron at Pegasus PhysioSouth – who is excellent, and he suggested that she could have less chronic back pain if she took weights back up again. She’s recently told me that I’m allowed one ‘told you so’ about this, since she found herself in a spot where previously she would have put herself in a poor lifting position, but the weight training gave her a better option and she intuitively used it, avoiding pain. So that’s a good thing – complicated because of her bodies complicated history, but an excellent trainer and physio team are making progress.

Earlier this year she had a hell of a fright, with a regular eye checkup getting referred into a ‘you are going blind; maybe tomorrow, maybe within 10 years’ nightmare scenario. Fortunately a second opinion got a specialist who probably knows the same amount but was willing to communicate it with actual words… Lynne has a condition which diabetes (type I or II) can affect, and she has a vein that can alter state somewhat arbitrarily but will probably only degrade slowly, particularly if Lynne’s diet is managed as she has been doing.

Diet wise, Lynne also has been losing some weight but this is complicated by her chronic idiopathic pancreatitis. That’s code for ‘it keeps happening and we don’t know why’ pancreatitis. We’ve consulted a specialist in the North Island who comes highly recommended by Lynne’s GP, who said that rapid weight loss is a little known but possible cause of pancreatitis – and that fits the timelines involved. So Lynne needs to lose weight to manage the onset of type II diabetes. But not to fast, to avoid pancreatitis, which will hasten the onset of type II diabetes. Aiee. Slow but steady – she’s working with the same doctor I am for that, and a similar diet, though lower on the fats as she has no gall… bladder.

In April our kitchen waste pipe started chronically blocking, and investigation with a drain robot revealed a slump in the pipe. Ground penetrating radar reveal an anomaly under the garage… and this escalated. We’re going to have to move out of the house for a week while half the house’s carpets are lifted, grout is pumped into the foundations to tighten it all back up again – and hopefully they don’t over pump it – and then it all gets replaced. Oh, and it looks like the drive will be replaced again, to fix the slumped pipe permanently. It will be lovely when done but right now we’re facing a wall of disruption and argh.

Around September I think, we managed to have a gas poisoning scare – our gas hob was left on and triggered a fireball which fortunately only scared Lynne rather than flambéing her face. We did however not know how much exposure we’d had to the LPG, nor to partially combusted gas – which produces toxic CO as a by-product, so there was a trip into the hospital for observation with Cynthia, with Lynne opting out. Lynne and Cynthia had had plenty of the basic symptoms – headaches, dizziness and so on at the the time, but after waiting for 2 hours in the ER queue that had faded. Le sigh. The hospital, bless their cotton socks don’t have the necessary equipment to diagnose CO poisoning without a pretty invasive blood test, but still took Cynthia’s vitals using methods (manual observation and a infra-red reader) that are confounded by the carboxyhemoglobin that forms from the CO that has been inhaled. Pretty unimpressed – our GP was livid. (This is one recommended protocol). Oh, and our gas hob when we got checked out – as we were not sure if we had left it on, or it had been misbehaving, turned out to have never been safe, got decertified and the pipe cut at the regulator. So we’re cooking on a portable induction hob for now.

When we moved to Rangiora I was travelling a lot more, Christchurch itself had poorer air quality than Rangiora, and our financial base was a lot smaller. Now, Rangiora’s population has gone up nearly double (13k to 19k conservatively – and that’s ignoring the surrounds that use Rangiora as a base), we have more to work with, the air situation in Christchurch has improved massively, and even a busy years travel is less than I was doing before Cynthia came along. We’re looking at moving – we’re not sure where yet; maybe more country, maybe more city.

One lovely bright spot over the last few years has been reconnecting with friends from school, largely on Facebook – some of whom I had forgotten that I knew back at school – I had a little clique but was not very aware of the wider school population in hindsight (this was more than a little embarrassing to me, as I didn’t want to blurt out “who are you?!”) – and others whom I had not :). Some of these reconnections are just light touch person-X exists and cares somewhat – and that’s cool. One in particular has grown into a deeper friendship than we had back as schoolkids, and I am happy and grateful that that has happened.

Our cats are fat and happy. Well mostly. Baggy is fat and stressed and spraying his displeasure everywhere whenever the stress gets too much :(. Cynthia calls him Mr Widdlepants. The rest of the time he cuddles and purrs and is generally happy with life. Dibbler and Kitten-of-the-wild are relatively fine with everything.

Cynthia’s violin is coming along well. She did a small performance for her classroom (with her teacher) and wowed them. I’ve been inspired to start practising trumpet again. After 27 years of decay my skills are decidedly rusty, but they are coming along. Finding arrangements for violin + trumpet is a bit challenging, and my sight-reading-with-transposition struggles to cope, but we make do. Lynne is muttering about getting a clarinet or drum-kit and joining in.

So, 2019. Whew. I hope yours was less stressful and had as many or more bright points than ours. Onwards to 2020.

,

BlueHackersBlueHackers crowd-funding free psychology services at LCA and other conferences

BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

The post BlueHackers crowd-funding free psychology services at LCA and other conferences first appeared on BlueHackers.org.

,

Robert CollinsA Cachecash retrospective

In June 2019 I started a new role as a software engineer at a startup called Cachecash. Today is probably the last day of payroll there, and as is my usual practice, I’m going to reflect back on my time there. Less commonly, I’m going to do so in public, as we’re about to open the code (yay), and its not a mega-corporation with everything shuttered up (also yay).

Framing

This is intended to be a blameless reflection on what has transpired. Blameless doesn’t mean inaccurate; but it means placing the focus on the process and system, not on the particular actor that happened to be wearing the hat at the time a particular event happened. Sometimes the system is defined by the actors, and in that case – well, I’ll let you draw your own conclusions if you encounter that case.

A retrospective that we can’t learn from is useless. Worse than useless, because it takes time to write and time to read and that time is lost to us forever. So if a thing is a particular way, it is going to get said. Not to be mean, but because false niceness will waste everyone’s time. Mine and my ex-colleagues whose time I respect. And yours, if you are still reading this.

What was Cachecash

Cachecash was a startup – still is in a very technical sense, corporation law being what it is. But it is still a couple of code bases – and a nascent open source project (which will hopefully continue) – built to operationalise and productise this research paper that the Cachecash founders wrote.

What it isn’t anymore is a company investing significant amounts of time and money in the form of engineering in making code, to make those code bases better.

Cachecash was also a team of people. That obviously changed over time, but at the time I write this it is:

  • Ghada
  • Justin
  • Kevin
  • Marcus
  • Petar
  • Robert
  • Scott

And we’re all pretty fantastic, if you ask me :).

Technical overview

The CAPNet paper that I linked above doesn’t describe a product. What it describes is a system that permits paying caches (think squid/varnish etc) for transmitting content to clients, while also detecting attempts by such caches to claim payment when they haven’t transmitted, or attempting to collude with a client to pretend to overtransmit and get paid that way. A classic incentives-aligned scheme.

Note that there is no blockchain involved at this layer.

The blockchain was added into this core system as a way to build a federated marketplace – the idea was that the blockchain provided a suitable substrate for negotiating the purchase and sale of contracts that would be audited using the CAPNet accounting system, the payments could be micropayments back onto the blockchain, and so on – we’d avoid the regular financial system, and we wouldn’t be building a fragile central system that would prevent other companies also participating.

Miners would mine coins, publishers would buy coins then place them in escrow as a promise to pay caches to deliver content to clients, and a client would deliver proof of delivery back to the cache which would then claim payment from the publisher.

Technical Challenges

There were a few things that turned up as significant issues. In no particular order:

The protocol

The protocol itself adds additional round trips to multiple peers – in its ‘normal’ configuration the client ends up running (web- for browers) GRPC connections to 5 endpoints (with all the normal windowing concerns, but potentially over QUIC), and then gets chunks of content in batches (concurrently) from 4 of the peers, runs a small crypto brute force operation on the combined result, and then moves onto the next group of content. This should be sounding suspiciously like TCP – it is basically a window management problem, and it has exactly the same performance management problems – fast start, maximum window size, how far to reduce it when problems are suffered. But accentuated: those 4 cache peers can all suffer their own independent noise problems, or be hostile. But also, they can also suffer correlated problems: they might all be in the same datacentre, or be all run by a hostile actor, or the client might be on a hostile WiFi link, or the client’s OS/browser might be hostile. Lets just say that there is a long, rich road for optimising this new protocol to make it fast, robust, reliable. Much as we have taken many years to make HTTP into QUIC, drawing upon techniques like forward error correction rather than retries – similar techniques will need to be applied to give this protocol similar performance characteristics. And evolving the protocol while maintaining the security properties is a complicated task, with three actors involved, who may collude in various ways.

An early performance analysis I did on the go code implementation showed that the brute forcing work was a bottleneck because while the time (once optimise) per second was entirely modest for any small amount of data, the delay added per window element acts as a brake on performance for high capacity low latency links. For a 1Gbps 25ms RTT link I estimated a need for 8 cores doing crypto brute forcing on the client.

JS

Cachecash is essentially implementing a new network protocol. There are some great hooks these days in browsers, and one can hook in and provide streams to things like video players to let them get one segment of video. However, for downloading an entire file – for instance, if one is downloading a full video, it is not so easy. This bug, open for 2 years now, is the standards based way to do it. Even so non-standards based way to do it involves buffering the entire content in memory, oh and reflecting everything through a static github service worker. (You of course host such a static page yourself, but then the whole idea of this federated distributed system breaks down a little).

Our initial JS implementation was getting under 512KBps with all-local servers – part of that was the bandwidth delay product issue mentioned above. Moving to getting chunks of content from each cache concurrently using futures improved that up to 512KBps, but thats still shocking for a system we want to be able to compete with the likes of Youtube, Cloudflare and Akamai.

One of the hot spots turned out to be calculating SHA-256 values – the CAPNet algorithm calculates thousands (it’s tunable, but 8k in the set I was analysing) of independent SHA’s per chunk of received data. This is a problem – in browser SHA routines, even the recent native hosted ones – are slow per SHA. They are not slow per byte. Most folk want to make a small number of SHA calculations. Maybe thousands in total. Not tens of thousands per MB of data received….. So we wrote an implementation of the core crypto routines in Rust WASM, which took our performance locally up to 2MBps in Firefox and 6MBps in Chromium.

It is also possible we’d show up as crypto-JS at that point and be blacklisted as malware!

Blockchain

Having chosen to involve a block chain in the stack we had to deal with that complexity. We chose to take bitcoin’s good bits and run with those rather than either running a sidechain, trying to fit new transaction types into bitcoin itself, or trying to shoehorn our particular model into e.g. Ethereum. This turned out to be a fairly large amount of work : not the core chain itself – cloning the parts of bitcoin that we wanted was very quick. But then layering on the changes that we needed, to start dealing with escrows and negotiating parameters between components and so forth. And some of the operational challenges below turned up here as well even just in developer test setups (in particular endpoint discovery).

Operational Challenges

The operational model was pretty interesting. The basic idea was that eventually there would be this big distributed system, a bit-coin like set of miners etc, and we’d be one actor in that ecosystem running some subset of the components, but that until then we’d be running:

  • A centralised ledger
  • Centralised random number generation for the micropayment system
  • Centralised deployment and operations for the cache fleet
  • Software update / vetting for the publisher fleet
  • Software update / publishing for the JS library
  • Some number of seed caches
  • Demo publishers to show things worked
  • Metrics, traces, chain explorer, centralised logging

We had most of this live and running in some fashion for most of the time I was there – we evolved it and improved it a number of times as we iterated on things. Where appropriate we chose open source components like Jaeger, Prometheus and Elasticsearch. We also added policy layers on top of them to provide rate limiting and anti-spoofing facilities. We deployed stuff in AWS, with EKS, and there were glitches and things to workaround but generally only a tiny amount of time went into that part of it. I think I spent a day on actual operations a month, or thereabouts.

Other parties were then expected to bring along additional caches to expand the network, additional publishers to expand the content accessible via the network, and clients to use the network.

Ensuring a process run by a third party is network reachable by a browser over HTTPS is a surprisingly non-simple problem. We partly simplified it by mandating that they run a docker container that we supplied, but there’s still the chance that they are running behind a firewall with asymmetric ingress. And after that we still need a domain name for their endpoint. You can give every cache a CNAME in a dedicated subdomain – say using their public key as the subdomain, so that only that cache can issue requests to update their endpoint information in DNS. It is all solvable, but doing it so that the amount of customer interaction and handholding is reduced to the bare minimum is important: a user with a fleet of 1000 machines doesn’t want to talk to us 1000 times, and we don’t want to talk to them either. But this was a another bit of this-isn’t-really-distributed-is-it grit in the distributed-ointment.

Adoption Challenges

ISPs with large fleets of machines are in principle happy to sell capacity on them in return for money – yay. But we have no revenue stream at the moment, so they aren’t really incentivised to put effort in, it becomes a matter of principle, not a fiscal “this is 10x better for my business” imperative. And right now, its 10x slower than HTTP. Or more.

Content owners with large amounts of content being delivered without a CDN would like a radically cheaper CDN. Except – we’re not actually radically cheaper on a cost structure basis. Current CDN’s are expensive for their expensive 2nd and third generation products because no-one offers what they offer – seamless in-request edge computing. But that ISP that is contributing a cache to the fleet is going to want the cache paid for, and thats the same cost structure as existing CDNs – who often have a free entry tier. We might have been able to make our network cheaper eventually, but I’m just not sure about the radically cheaper bit.

Content owners who would like a CDN marketplace where the CDN caches are competing with each other – driving costs down – rather than than the CDN operators competing – would absolutely love us. But I rather suspect that those owners want more sophisticated offerings. To be clear, I wasn’t on the customer development team, and didn’t get much in the way of customer development briefings. But things like edge computing workers, where completely custom code can run in the CDN network, adjacent to ones user, are much more powerful offerings than simple static content shipping offerings, and offered by all major CDN’s. These are trusted services – the CAPNet paper doesn’t solve the problem of running edge code and providing proof that it was run. Enarx might go some, or even a long way way to running such code in an untrusted context, but providing a proof that it was run – so that running it can become a mining or mining-like operation is a whole other question. Without such an answer, an edge computing network starts to depend on trusting the caches behaviour a lot more all over again – the network has no proof of execution to depend on.

Rapid adjustment – load spikes – is another possible use case, but the use of the blockchain to negotiate escrows actually seemed to work against our ability to offer that. Akami define load spike in a time frame faster than many block chains can decide that a transaction has actually been accepted. Offchain transactions are of course a known thing in the block chain space but again that becomes additional engineering.

Our use of a new network protocol – for all that it was layered on standard web technology – made it harder for potential content owners to adopt our technology. Rather than “we have 200 local proxies that will deliver content to your users, just generate a url of the form X.Y.Z”, our solution is “we do not trust the 200 local proxies that we have, so you need to run complicated JS in your browser/phone app etc” to verify that the proxies are actually doing their job. This is better in some ways – precisely because we don’t trust those proxies, but it also increases both the runtime cost of using the service, the integration cost adopting the service, and complexity of debugging issues receiving content via the service.

What did we learn?

It is said that “A startup is an organization formed to search for a repeatable and scalable business model.” What did we uncover in our search? What can we take away going forward?

In principle we have a classic two sided market – people with excess capacity close to users want to sell it, and people with excess demand for their content want to buy delivery capacity.

The baseline market is saturated. The market as a whole is on its third or perhaps fourth (depending on how you define things) major iteration of functionality.

Content delivery purchasers are ok with trusting their suppliers : any supply chain fraud happening in this space at the moment is so small no-one is talking about it that I heard about.

Some of the things we were doing don’t seem to have been important to the customers we talked to – I don’t have a great read on this, but in particular, the blockchain aspect seems to have been more important to our long term vision than to the 2-sided market place that we perceived. It would be fascinating to me to validate that somehow – would cache capacity suppliers be willing to trust us enough to sell capacity to us with just the auditing mechanism, without the blockchain? Would content providers be happy buying credit from us rather than from a neutral exchange?

What did I learn?

I think in hindsight my startup muscles were atrophied – it had been some years since Canonical and it took a few months to start really thinking lean-startup again on a personal basis. That’s ok, because I was hired to build systems. But its not great, because I can do better. So number one: think lean-startup and really step up to help with learning and validation.

I levelled up my Go lang skills. That was really nice – Kevin has deep knowledge there, and though I’ve written Go before I didn’t have a good appreciation for style or aesthetics, or why. I do now. Where before I’d say ‘I’m happy to dive in but its not a language I feel I really know’, I am now happy to say that I know Go. More to learn – there always is – but in a good place.

I did a similar thing to my JS skills, but not to the same degree. Having climbed fairly deeply into the JS client – which is written in Typescript, converted its bundling system to webpack to work better with Rust-WASM, and so on. Its still not my go-to place, but I’m much more comfortable there now.

And of course playing with Rust-WASM was pure delight. Markus and I are both Rust afficionados, and having a genuine reason to write some Rust code for work was just delightful. Finding this bug was just a bonus :).

It was also really really nice being back in a truely individual contributor role for a while. I really enjoyed being able to just fix bugs and get on with things while I got my bearings. I’ve ended up doing a bit more leadership – refining of requirements, translating between idea-and-specification and the like recently, but still about 80% of time has been able to be sit-down-and-code, and that really is a pleasant holiday.

What am I going to change?

I’m certainly going to get a new job :). If you’re hiring, hit me up. (If you don’t have my details already, linkedin is probably best).

I’m think there the core thing I need to do is more alignment of the day to day work I’m doing with needs of customer development : I don’t want to take on or take over the customer development role – that will often be done best in person with a customer for startups, and I’m happy remote – but the more I can connect what I’m trying to achieve with what will get the customers to pay us, the more successful any business I’m working in will be. This may be a case for non-vanity metrics, or talking more with the customer-development team, or – well, I don’t know exactly what it will look like until I see the context I end up in, but I think more connection will be important.

And I think the second major thing is to find a better balance between individual contribution and leadership. I love individual contribution, it is perhaps the least stressful and most Zen place to be. But it is also the least effective unless the project has exactly one team member. My most impactful and successful roles have been leadership roles, but the pure leadership role with no individual contribution slowly killed me inside. Pure individual contribution has been like I imagine crack to be, and perhaps just as toxic in the long term.

,

sthbrx - a POWER technical blogrfid and hrfid

I was staring at some assembly recently, and for not the first time encountered rfid and hrfid, two instructions that we use when doing things like returning to userspace, returning from OPAL to the kernel, or from a host kernel into a guest.

rfid copies various bits from the register SRR1 (Machine Status Save/Restore Register 1) into the MSR (Machine State Register), and then jumps to an address given in SRR0 (Machine Status Save/Restore Register 0). hrfid does something similar, using HSRR0 and HSRR1 (Hypervisor Machine Status Save/Restore Registers 0/1), and slightly different handling of MSR bits.

The various Save/Restore Registers are used to preserve the state of the CPU before jumping to an interrupt handler, entering the kernel, etc, and are set up as part of instructions like sc (System Call), by the interrupt mechanism, or manually (using instructions like mtsrr1).

Anyway, the way in which rfid and hrfid restores MSR bits is documented somewhat obtusely in the ISA (if you don't believe me, look it up), and I was annoyed by this, so here, have a more useful definition. Leave a comment if I got something wrong.

rfid - Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from SRR1 into the MSR, with the following exceptions:

  • MSR_3 (HV, Hypervisor State) = MSR_3 & SRR1_3
    [We won't put the thread into hypervisor state if we're not already in hypervisor state]

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or SRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = SRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = SRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = SRR1_48 | SRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_51 (ME, Machine Check Interrupt Enable) = (MSR_3 (HV, Hypervisor State) & SRR1_51) | ((! MSR_3) & MSR_51)
    [If we're not already in hypervisor state, we won't alter ME]

  • MSR_58 (IR, Instruction Relocate) = SRR1_58 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = SRR1_59 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = SRR0_0:61 || 0b00
    [Jump to SRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

hrfid - Hypervisor Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from HSRR1 into the MSR, with the following exceptions:

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or HSRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = HSRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = HSRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = HSRR1_48 | HSRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_58 (IR, Instruction Relocate) = HSRR1_58 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = HSRR1_59 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = HSRR0_0:61 || 0b00
    [Jump to HSRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

,

Robert CollinsRust and distributions

Daniel wrote a lovely blog post about Rust’s ability to be included in distributions, both as a language that you can get via the distribution, and as the language that components of the distribution are being written in.

I think this is a great goal to raise and I have just a few thoughts and quibbles. First I want to acknowledge and agree with him on the Rust community, its so very nice, and he is doing a great thing as rustup lead; I wish I had more time to put in, I have more things I want to contribute to rustup. I’ll try to get back to the meetings soon.

On trust

I completely agree about the need for the crates index improvement : without those we cannot have a mirror network, and thats a significant issue for offline users and slow-region users.

On curlsh though

It isn’t the worst possible thing, for all that its “untrusted bootstrapping”, the actual thing downloaded is https secured etc, and so is the rustup binary itself. Put another way, I think the horror is more perceptual than analyzed risk. Someone that trusts Verisign etc enough to download the Debian installer enough over it, has exactly the same risk as someone trusting Verisign enough to download rustup at that point in time.

Cross signing curlsh that with per-distro keys or something seems pretty ridiculous to me, since the root of trust is still that first download; unless you’re wandering up to someone who has bootstrapped their compiler by hand (to avoid reflections-on-trust attacks), to get an installer, to build a system, to then do reproducible builds, to check that other systems are actually safe… aieeee.

I think its easier to package the curl|sh shell script in Debian itself perhaps? apt install get-rustup; then if / when rustup becomes packaged the user instructions don’t change but the root of trust would, as get-rustup would be updated to not download rustup, but to trigger a different package install, and so forth.

I don’t think its desirable though, to have distribution forks of the contents that rustup manages – Debian+Redhat+Suse+… builds of nightly rust with all the things failing or not, and so on – I don’t see who that would help. And if we don’t have that then the root of trust would still not be shifted under the GPG keychain – it would still be the HTTPS infrastructure for downloading rust toolchains + the integrity of the rustup toolchain builds themselves. Making rustup, which currently shares that trust, have a different trust root, seems pointless.

On duplication of dependencies

I think Debian needs to become more inclusive here, not Rustup. Debian has spent; pauses, counts, yes, DECADES, rejecting multiple entire ecosystems because of a prejuidiced view about what the Right Way to manage dependencies is. And they are not right in a universal sense. They were right in an engineering sense: given constraints (builds are expensive, bandwidth is expensive, disk is expensive), they are right. But those are not universal constraints, and seeking to impose those constraints on Java and Node – its been an unmitigated disaster. It hasn’t made those upstreams better, or more secure, or systematically fixed problems for users. I have another post on this so rather than repeating I’m going to stop here :).

I think Rust has – like those languages – made the crucial, maintainer and engineering efficiency important choice to embrace enabling incremental change across libraries, with the consequence that dependencies don’t shift atomically, and sure, this is basically incompatible with Debian packaging world view which says that point and patch releases of libraries are not distinct packages, and thus the shared libs for these things all coexist in the same file on disk. Boom! Crash!

I assert that it is entirely possible to come up with a reasonable design for managing a respository of software that doesn’t make this conflation, would allow actual point and patch releases of exist as they are for the languages that have this characteristic, and be amenable to automation, auditing and reporting for security issues. E.g. Modernise Debian to cope with this fundamentally different language design decision… which would make Java and Node and Rust work so very much better.

Alternatively, if Debian doesn’t want to make it possible to natively support languages that have made this choice, Debian could:

  • ship static-but-for-system-libs builds
  • not include things written in rust
  • ask things written in rust to converge their dependencies again and again and again (and only update them when the transitive dependencies across the entire distro have converged)

I have a horrible suspicion about which Debian will choose to do :(. The blinkers / echo chamber are so very strong in that community.

For Windows

We got to parity with Linux for IO for non-McAfee users, but I guess there are a lot of them out there; we probably need to keep pushing on tweaking it until it work better for them too; perhaps autodetect McAfee and switch to minimal? I agree that making Windows users – like I am these days – feel tier one, would be nice :). Maybe a survey of user experience would be a good starting point.

Shared libraries

Perhaps generating versioned symbols automatically and building many versions of the crate and then munging them together? But I’d also like to point here again that the whole focus on shared libraries is a bit of a distribution blind spot, and looking at the vast amount of distribution of software occuring in app stores and their model, suggests different ways of dealing with these things. See also the fairly specific suggestion I make about the packaging system in Debian that is the root of the problem in my entirely humble view.

Bonus

John Goerzen posted an entirely different thing recently, but in it he discusses programs that don’t properly honour terminfo. Sadly I happen to know that large chunks of the Rust ecosystem assume that everything is ANSI these days, and it certainly sounds like, at least for John, that isn’t true. So thats another way in which Rust could be more inclusive – use these things that have been built, rather than being modern and new age and reinventing the 95% match.

,

Simon HormsUpstream Linux Kernel Development

It's All About the Long Term

by Simon Horman

Introduction

My thoughts on the importance of upstream Linux kernel development. An exploration of the motivation for organisations to invest in development for the upstream Linux kernel.

  • Why develop for Linux and why for upstream?
  • Upstreaming models: Upstream first or upstream last
  • Differentiating products and technology
  • Standardisation through collaboration

Why Linux?

Before discussing the merits of working on upstreaming code to the upstream Linux kernel it is worth examining the motivation for developing for Linux at all. There are surely many reasons, lets dig into a few.

Stand on the shoulders of giants

Modern computers are complex high-performance machines. The development effort required to fully utilise a system grows with complexity of its hardware. I would argue that we are long past the point where it is impractical to create general purpose kernels to utilise modern hardware. Rather, a far more practical approach is to build on the work of others. And Linux is a premier platform for such work.

Linux is pervasive

Linux has become pervasive in various parts of industry including cloud computing and enterprise ITC. In order to access such markets solutions are often required to work on Linux. This includes facilities, such as hardware support, provided by the kernel. To meet such customer needs solutions need to be developed for Linux.

Attract developer talent

By working on the Linux kernel, which is an Open Source project, developers are able develop portable skills. Each contribution is peer reviewed and accepted on its merit. And each contribution acts as part of an evolving online CV for the developer. This is clearly and attractive mechanism for developers to display their skills and thus Linux kernel development attracts talent.

Why Upstream?

When working on the Linux kernel the resulting code may be kept out-of-tree or submitted for inclusion in upstream. An attraction of working out-of-tree is that it has a somewhat lower barrier to entry. The process of upstreaming can be time consuming and approaches to problems that work out-of-tree may not be accepted into upstream for a variety of reasons. So there can be a strong temptation to keep code out-of-tree. However, I argue against this approach.

Working with the upstream Linux kernel is, in my opinion, all about planning for the long term. It involves up-front work to enhance the kernel in conjunction with the upstream development community. And doing so is a step towards long term maintainability of code and a sustainable development model.

Technical Debt

If code is worked on out-of-tree for a sustained period of time then the volume and complexity of the code is bound to grow. And each time a part of the upstream kernel on which this code is dependent on changes the out-of-tree code needs to either be refactored or risk obsolescence. The more time passes the more code there is to refactor, as there is more code the chance of a dependency changing increases, and as the distance to the original kernel version increases the complexity of such refactoring is likely to also increase. In short, technical debt mounts over time requiring ever increasing effort to maintain.

Focus

Another benefit of working with upstream is that in large organisations it can provide focus. Rather than different teams developing different solutions to similar problems for each customer for each hardware generation a single solution can be developed upstream. As customer needs change over time they can be addressed via incremental changes upstream rather than perpetuating an explosion of out-of-tree code. And if the focus is on upstream then discussion around which in-house solution to reuse becomes moot.

Avoiding Fragmentation

Working with upstream also helps to reduce fragmentation in solutions. Users want a consistent experience when configuring hardware from different vendors, running kernels from different distributions and so on. Adding special sauce at this layer only diminishes the user experience in the long term.

Upstreaming Models

When working on the Debian kernel team, many years ago now, I instigated a policy of only accepting changes from the upstream kernel. Prior to this policy Debian had often seen itself as acting as a testing ground for upstream-bound, or in some cases not upstream-bound, features. A key problem with this approach was that once a feature became available in the Debian kernel it was bound to be used. And once it became used it had to be maintained. And if it wasn't upstream then that maintenance would typically fall to the Debian Kernel team whose bandwidth was already consumed packaging the upstream kernel.

Upstream first is the idea that code is developed for the upstream kernel. That it is in the upstream kernel that it is made available. And that consumers of the code do so by consuming the upstream kernel. It seeks to place the focus of kernel development where I believe it belongs, in upstream.

The converse of upstream fist is upstream last. In this model code is developed out of tree. And when the time is right it is contributed to upstream. A key attraction of this approach is that it can, in the short term at least, lead to higher velocity of feature development. It may also provide a way to develop ideas that do not seem appropriate for upstream yet. However, it leads to a number of problems.

For one, code that is not developed for upstream is, in my experience, often not suitable for inclusion in upstream. So in this model a typical upstreaming effort involves refactoring or more often than not rewriting the code with the out-of-tree version acting as a reference implementation. Clearly duplicated effort. And there are the problems outlined previously with maintaining code-out-of tree. So while upstream-last can be useful it does come at some cost.

Differentiating Products and Technology

An important distinction that can bee made working on the Linux kernel is that between product and technology. On the one hand technology can be raw, incomplete and often only of use as part of a larger whole. On the other hand products are polished, ideally complete, systems that can readily by utilised by users.

When developing kernel code innovation is occurring, technology is being developed. By participating in upstream development this distinction becomes clearer. The collaboration, often between competitors, in upstream kernel development leads to the technology that can be sustained for the long term. Meeting shorter term customer needs by delivering products becomes a distinctly different activity.

Standardisation Through Collaboration

The process of standardisation takes many guises. One obvious form is through standards bodies. In this model the standards body formulates a standard and possibly a reference implementation and then it is up to adopters to implement the standard. The Linux kernel implements many such standards but it also serves as a mechanism for a very different form of standardisation: the standard emerges from the implementation. Given the pervasive nature of Linux its implementation of a feature can be come a standard. Thus by collaborating on upstream kernel development one effectively participates in a standardisation process.

Conclusion

This discussion has covered some motivations for developing for the upstream kernel and the upstream first and last models for upstream development. If I could stress one point is is that upstream development is all about building a string base for long term maintainability. A solid foundation on which to build products that address customer need.

Credits

The ideas presented above reflect those developed in collaboration with other team members and to a greater or lesser extent put into practice while working with those teams in the past and present. These teams include:

  • Netronome
  • Hisao Munakata, Magnus Damm, Paul Mundt and others at Renesas Electronics
  • Debian Kernel Team

,

Tim SerongNetwork Maintenance

To my intense amazement, it seems that NBN Co have finally done sufficient capacity expansion on our local fixed wireless tower to actually resolve the evening congestion issues we’ve been having for the past couple of years. Where previously we’d been getting 22-23Mbps during the day and more like 2-3Mbps (or worse) during the evenings, we’re now back to 22-23Mbps all the time, and the status lights on the NTD remain a pleasing green, rather than alternating between green and amber. This is how things were way back at the start, six years ago.

We received an email from iiNet in early July advising us of the pending improvements. It said:

Your NBN™ Wireless service offers maximum internet speeds of 25Mbps downland and 5Mbps upload.

NBN Co have identified that your service is connected to a Wireless cell that is currently experiencing congestion, with estimated typical evening speeds of 3~6 Mbps. This congestion means that activities like browsing, streaming or gaming might have been and could continue to be slower than promised, especially when multiple people or devices are using the internet at the same time.

NBN Co estimates that capacity upgrades to improve the speed congestion will be completed by Dec-19.

At the time we were given the option of moving to a lower speed plan with a $10 refund because we weren’t getting the advertised speed, or to wait it out on our current plan. We chose the latter, because if we’d downgraded, that would have reduced our speed during the day, when everything was otherwise fine.

We did not receive any notification from iiNet of exactly when works would commence, nor was I ever able to find any indication of planned maintenance on iiNet’s status page. Instead, I’ve come to rely on notifications from my neighbour, who’s with activ8me. He receives helpful emails like this:

This is a courtesy email from Activ8me, Letting you know NBN will be performing Fixed Wireless Network capacity work in your area that might affect your connectivity to the internet. This activity is critical to the maintenance and optimisation of the network. The approximate dates of this maintenance/upgrade work will be:

Impacted location: Neika, TAS & Downstream Sites & Upstream Sites
NBN estimates interruption 1 (Listed Below) will occur between:
Start: 24/09/19 7:00AM End: 24/09/19 8:00PM
NBN estimates interruption 2 (Listed Below) will occur between:
Start: 25/09/19 7:00AM End: 25/09/19 8:00PM
NBN estimates interruption 3 (Listed Below) will occur between:
Start: 01/10/19 7:00AM End: 01/10/19 8:00PM
NBN estimates interruption 4 (Listed Below) will occur between:
Start: 02/10/19 7:00AM End: 02/10/19 8:00PM
NBN estimates interruption 5 (Listed Below) will occur between:
Start: 03/10/19 7:00AM End: 03/10/19 8:00PM
NBN estimates interruption 6 (Listed Below) will occur between:
Start: 04/10/19 7:00AM End: 04/10/19 8:00PM
NBN estimates interruption 7 (Listed Below) will occur between:
Start: 05/10/19 7:00AM End: 05/10/19 8:00PM
NBN estimates interruption 8 (Listed Below) will occur between:
Start: 06/10/19 7:00AM End: 06/10/19 8:00PM

Change start
24/09/2019 07:00 Australian Eastern Standard Time

Change end
06/10/2019 20:00 Australian Eastern Daylight Time

This is expected to improve your service with us however, occasional loss of internet connectivity may be experienced during the maintenance/upgrade work.
Please note that the upgrades are performed by NBN Co and Activ8me has no control over them.
Thank you for your understanding in this matter, and your patience for if it does affect your service. We appreciate it.

The astute observer will note that this is pretty close to two weeks of scheduled maintenance. Sure enough, my neighbour and I (and presumably everyone else in the area) enjoyed major outages almost every weekday during that period, which is not ideal when you work from home. But, like I said at the start, they did finally get the job done.

Interestingly, according to activ8me, there is yet more NBN maintenance scheduled from 21 October 07:00 ’til 27 October 21:00, then again from 28 October 07:00 ’til 3 November 21:00 (i.e. another two whole weeks). The only scheduled upgrade I could find listed on iiNet’s status page is CM-177373, starting “in 13 days” with a duration of 6 hours, so possibly not the same thing.

Based on the above, I am convinced that there is some problem with iiNet’s status page not correctly reporting NBN incidents, but of course I have no idea whether this is NBN Co not telling iiNet, iiNet not listening to NBN Co, or if it’s just that the status web page is busted.

,

Gary PendergastTalking with WP&UP

At WordCamp Europe this year, I had the opportunity to chat with the folks at WP&UP, who are doing wonderful work providing mental health support in the WordPress community.

Listen to the podcast, and check out the services that WP&UP provide!

,

sthbrx - a POWER technical blogTEN THOUSAND DISKS

In OpenPOWER land we have a project called op-test-framework which (for all its strengths and weaknesses) allows us to test firmware on a variety of different hardware platforms and even emulators like Qemu.

Qemu is a fantasic tool allowing us to relatively quickly test against an emulated POWER model, and of course is a critical part of KVM virtual machines running natively on POWER hardware. However the default POWER model in Qemu is based on the "pseries" machine type, which models something closer to a virtual machine or a PowerVM partition rather than a "bare metal" machine.

Luckily we have Cédric Le Goater who is developing and maintaining a Qemu "powernv" machine type which more accurately models running directly on an OpenPOWER machine. It's an unwritten rule that if you're using Qemu in op-test, you've compiled this version of Qemu!

Teething Problems

Because the "powernv" type does more accurately model the physical system some extra care needs to be taken when setting it up. In particular at one point we noticed that the pretend CDROM and disk drive we attached to the model were.. not being attached. This commit took care of that; the problem was that the PCI topology defined by the layout required us to be more exact about where PCI devices were to be added. By default only three spare PCI "slots" are available but as the commit says, "This can be expanded by adding bridges"...

More Slots!

Never one to stop at a just-enough solution, I wondered how easy it would be to add an extra PCI bridge or two to give the Qemu model more available slots for PCI devices. It turns out, easy enough once you know the correct invocation. For example, adding a PCI bridge in the first slot of the first default PHB is:

-device pcie-pci-bridge,id=pcie.3,bus=pcie.0,addr=0x0

And inserting a device in that bridge just requires us to specify the bus and slot:

-device virtio-blk-pci,drive=cdrom01,id=virtio02,bus=pcie.4,addr=3

Great! Each bridge provides 31 slots, so now we have plenty of room for extra devices.

Why Stop There?

We have three free slots, and we don't have a strict requirement on where devices are plugged in, so lets just plug a bridge into each of those slots while we're here:

-device pcie-pci-bridge,id=pcie.3,bus=pcie.0,addr=0x0 \
-device pcie-pci-bridge,id=pcie.4,bus=pcie.1,addr=0x0 \
-device pcie-pci-bridge,id=pcie.5,bus=pcie.2,addr=0x0

What happens if we insert a new PCI bridge into another PCI bridge? Aside from stressing out our PCI developers, a bunch of extra slots! And then we could plug bridges into those bridges and then..


Thus was born "OpTestQemu: Add PCI bridges to support more devices." and the testcase "Petitboot10000Disks". The changes to the Qemu model setup fill up each PCI bridge as long as we have devices to add, but reserve the first slot to add another bridge if we run out of room... and so on..

Officially this is to support adding interesting disk topologies to test Pettiboot use cases, stress test device handling, and so on, but while we're here... what happens with 10,000 temporary disks?

======================================================================
ERROR: testListDisks (testcases.Petitboot10000Disks.ConfigEditorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sam/git/op-test-framework/testcases/Petitboot10000Disks.py", line 27, in setUp
    self.system.goto_state(OpSystemState.PETITBOOT_SHELL)
  File "/home/sam/git/op-test-framework/common/OpTestSystem.py", line 366, in goto_state
    self.state = self.stateHandlers[self.state](state)
  File "/home/sam/git/op-test-framework/common/OpTestSystem.py", line 695, in run_IPLing
    raise my_exception
UnknownStateTransition: Something happened system state="2" and we transitioned to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and the Exception=
"filedescriptor out of range in select()"
 caused the system to go to UNKNOWN_BAD and the system will be stopping."

Yeah that's probably to be expected without some more massaging. What about a more modest 512?

I: Resetting PHBs and training links...
[   55.293343496,5] PCI: Probing slots...
[   56.364337089,3] PHB#0000:02:01.0 pci_find_ecap hit a loop !
[   56.364973775,3] PHB#0000:02:01.0 pci_find_ecap hit a loop !
[   57.127964432,3] PHB#0000:03:01.0 pci_find_ecap hit a loop !
[   57.128545637,3] PHB#0000:03:01.0 pci_find_ecap hit a loop !
[   57.395489618,3] PHB#0000:04:01.0 pci_find_ecap hit a loop !
[   57.396048285,3] PHB#0000:04:01.0 pci_find_ecap hit a loop !
[   58.145944205,3] PHB#0000:05:01.0 pci_find_ecap hit a loop !
[   58.146465795,3] PHB#0000:05:01.0 pci_find_ecap hit a loop !
[   58.404954853,3] PHB#0000:06:01.0 pci_find_ecap hit a loop !
[   58.405485438,3] PHB#0000:06:01.0 pci_find_ecap hit a loop !
[   60.178957315,3] PHB#0001:02:01.0 pci_find_ecap hit a loop !
[   60.179524173,3] PHB#0001:02:01.0 pci_find_ecap hit a loop !
[   60.198502097,3] PHB#0001:02:02.0 pci_find_ecap hit a loop !
[   60.198982582,3] PHB#0001:02:02.0 pci_find_ecap hit a loop !
[   60.435096197,3] PHB#0001:03:01.0 pci_find_ecap hit a loop !
[   60.435634380,3] PHB#0001:03:01.0 pci_find_ecap hit a loop !
[   61.171512439,3] PHB#0001:04:01.0 pci_find_ecap hit a loop !
[   61.172029071,3] PHB#0001:04:01.0 pci_find_ecap hit a loop !
[   61.425416049,3] PHB#0001:05:01.0 pci_find_ecap hit a loop !
[   61.425934524,3] PHB#0001:05:01.0 pci_find_ecap hit a loop !
[   62.172664549,3] PHB#0001:06:01.0 pci_find_ecap hit a loop !
[   62.173186458,3] PHB#0001:06:01.0 pci_find_ecap hit a loop !
[   63.434516732,3] PHB#0002:02:01.0 pci_find_ecap hit a loop !
[   63.435062124,3] PHB#0002:02:01.0 pci_find_ecap hit a loop !
[   64.177567772,3] PHB#0002:03:01.0 pci_find_ecap hit a loop !
[   64.178099773,3] PHB#0002:03:01.0 pci_find_ecap hit a loop !
[   64.431763989,3] PHB#0002:04:01.0 pci_find_ecap hit a loop !
[   64.432285000,3] PHB#0002:04:01.0 pci_find_ecap hit a loop !
[   65.180506790,3] PHB#0002:05:01.0 pci_find_ecap hit a loop !
[   65.181049905,3] PHB#0002:05:01.0 pci_find_ecap hit a loop !
[   65.432105600,3] PHB#0002:06:01.0 pci_find_ecap hit a loop !
[   65.432654326,3] PHB#0002:06:01.0 pci_find_ecap hit a loop !

(That isn't good)

[   66.177240655,5] PCI Summary:
[   66.177906083,5] PHB#0000:00:00.0 [ROOT] 1014 03dc R:00 C:060400 B:01..07 
[   66.178760724,5] PHB#0000:01:00.0 [ETOX] 1b36 000e R:00 C:060400 B:02..07 
[   66.179501494,5] PHB#0000:02:01.0 [ETOX] 1b36 000e R:00 C:060400 B:03..07 
[   66.180227773,5] PHB#0000:03:01.0 [ETOX] 1b36 000e R:00 C:060400 B:04..07 
[   66.180953149,5] PHB#0000:04:01.0 [ETOX] 1b36 000e R:00 C:060400 B:05..07 
[   66.181673576,5] PHB#0000:05:01.0 [ETOX] 1b36 000e R:00 C:060400 B:06..07 
[   66.182395253,5] PHB#0000:06:01.0 [ETOX] 1b36 000e R:00 C:060400 B:07..07 
[   66.183207399,5] PHB#0000:07:02.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   66.183969138,5] PHB#0000:07:03.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 

(a lot more of this)

[   67.055196945,5] PHB#0002:02:1e.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   67.055926264,5] PHB#0002:02:1f.0 [PCID] 1af4 1001 R:00 C:010000 (          scsi) 
[   67.094591773,5] INIT: Waiting for kernel...
[   67.095105901,5] INIT: 64-bit LE kernel discovered
[   68.095749915,5] INIT: Starting kernel at 0x20010000, fdt at 0x3075d270 168365 bytes

zImage starting: loaded at 0x0000000020010000 (sp: 0x0000000020d30ee8)
Allocating 0x1dc5098 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x000000002001f000:0x0000000020d2e578)...
Done! Decompressed 0x1c22900 bytes

Linux/PowerPC load: 
Finalizing device tree... flat tree at 0x20d320a0
[   10.120562] watchdog: CPU 0 self-detected hard LOCKUP @ pnv_pci_cfg_write+0x88/0xa4
[   10.120746] watchdog: CPU 0 TB:50402010473, last heartbeat TB:45261673150 (10039ms ago)
[   10.120808] Modules linked in:
[   10.120906] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.5-openpower1 #2
[   10.120956] NIP:  c000000000058544 LR: c00000000004d458 CTR: 0000000030052768
[   10.121006] REGS: c0000000fff5bd70 TRAP: 0900   Not tainted  (5.0.5-openpower1)
[   10.121030] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48002482  XER: 20000000
[   10.121215] CFAR: c00000000004d454 IRQMASK: 1 
[   10.121260] GPR00: 00000000300051ec c0000000fd7c3130 c000000001bcaf00 0000000000000000 
[   10.121368] GPR04: 0000000048002482 c000000000058544 9000000002009033 0000000031c40060 
[   10.121476] GPR08: 0000000000000000 0000000031c40060 c00000000004d46c 9000000002001003 
[   10.121584] GPR12: 0000000031c40000 c000000001dd0000 c00000000000f560 0000000000000000 
[   10.121692] GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 
[   10.121800] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   10.121908] GPR24: 0000000000000005 0000000000000000 0000000000000000 0000000000000104 
[   10.122016] GPR28: 0000000000000002 0000000000000004 0000000000000086 c0000000fd9fba00 
[   10.122150] NIP [c000000000058544] pnv_pci_cfg_write+0x88/0xa4
[   10.122187] LR [c00000000004d458] opal_return+0x14/0x48
[   10.122204] Call Trace:
[   10.122251] [c0000000fd7c3130] [c000000000058544] pnv_pci_cfg_write+0x88/0xa4 (unreliable)
[   10.122332] [c0000000fd7c3150] [c0000000000585d0] pnv_pci_write_config+0x70/0x9c
[   10.122398] [c0000000fd7c31a0] [c000000000234fec] pci_bus_write_config_word+0x74/0x98
[   10.122458] [c0000000fd7c31f0] [c00000000023764c] __pci_read_base+0x88/0x3a4
[   10.122518] [c0000000fd7c32c0] [c000000000237a18] pci_read_bases+0xb0/0xc8
[   10.122605] [c0000000fd7c3300] [c0000000002384bc] pci_setup_device+0x4f8/0x5b0
[   10.122670] [c0000000fd7c33a0] [c000000000238d9c] pci_scan_single_device+0x9c/0xd4
[   10.122729] [c0000000fd7c33f0] [c000000000238e2c] pci_scan_slot+0x58/0xf4
[   10.122796] [c0000000fd7c3430] [c000000000239eb8] pci_scan_child_bus_extend+0x40/0x2a8
[   10.122861] [c0000000fd7c34a0] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.122928] [c0000000fd7c3580] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.122993] [c0000000fd7c35f0] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.123059] [c0000000fd7c36d0] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.123124] [c0000000fd7c3740] [c000000000239e34] pci_scan_bridge_extend+0x4d4/0x504
[   10.123191] [c0000000fd7c3820] [c00000000023a0f8] pci_scan_child_bus_extend+0x280/0x2a8
[   10.123256] [c0000000fd7c3890] [c000000000239b5c] pci_scan_bridge_extend+0x1fc/0x504
[   10.123322] [c0000000fd7c3970] [c00000000023a064] pci_scan_child_bus_extend+0x1ec/0x2a8
[   10.123388] [c0000000fd7c39e0] [c000000000239b5c] pci_scan_bridge_extend+0x1fc/0x504
[   10.123454] [c0000000fd7c3ac0] [c00000000023a064] pci_scan_child_bus_extend+0x1ec/0x2a8
[   10.123516] [c0000000fd7c3b30] [c000000000030dcc] pcibios_scan_phb+0x134/0x1f4
[   10.123574] [c0000000fd7c3bd0] [c00000000100a800] pcibios_init+0x9c/0xbc
[   10.123635] [c0000000fd7c3c50] [c00000000000f398] do_one_initcall+0x80/0x15c
[   10.123698] [c0000000fd7c3d10] [c000000001000e94] kernel_init_freeable+0x248/0x24c
[   10.123756] [c0000000fd7c3db0] [c00000000000f574] kernel_init+0x1c/0x150
[   10.123820] [c0000000fd7c3e20] [c00000000000b72c] ret_from_kernel_thread+0x5c/0x70
[   10.123854] Instruction dump:
[   10.123885] 7d054378 4bff56f5 60000000 38600000 38210020 e8010010 7c0803a6 4e800020 
[   10.124022] e86a0018 54c6043e 7d054378 4bff5731 <60000000> 4bffffd8 e86a0018 7d054378 
[   10.124180] Kernel panic - not syncing: Hard LOCKUP
[   10.124232] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.5-openpower1 #2
[   10.124251] Call Trace:

I wonder if I can submit that bug without someone throwing something at my desk.

,

sthbrx - a POWER technical blogdm-crypt: Password Prompts Proliferate

Just recently Petitboot added a method to ask the user for a password before allowing certain actions to proceed. Underneath the covers this is checking against the root password, but the UI "pop-up" asking for the password is relatively generic. Something else which has been on the to-do list for a while is support for mounting encrpyted partitions, but there wasn't a good way to retrieve the password for them - until now!

With the password problem solved, there isn't too much else to do. If Petitboot sees an encrypted partition it makes a note of it and informs the UI via the device_add interface. Seeing this the UI shows this device in the UI even though there aren't any boot options associated with it yet:

encrypted_hdr

Unlike normal devices in the menu these are selectable; once that happens the user is prompted for the password:

encrypted_password

With password in hand pb-discover will then try to open the device with cryptsetup. If that succeeds the encrypted device is removed from the UI and replaced with the new un-encrypted device:

unencrypted_hdr

That's it! These devices can't be auto-booted from at the moment since the password needs to be manually entered. The UI also doesn't have a way yet to select specific options for cryptsetup, but if you find yourself needing to do so you can run cryptsetup manually from the shell and pb-discover will recognise the new unencrypted device automatically.

This is in Petitboot as of v1.10.3 so go check it out! Just make sure your image includes the kernel and cryptsetup dependencies.

,

Robert CollinsWant me to work with you?

Reach out to me – I’m currently looking for something interesting to do. https://www.linkedin.com/in/rbtcollins/ and https://twitter.com/rbtcollins are good ways to grab me if you don’t already have my details.

Should you reach out to me? Maybe :). First, a little retrospective.

Three years ago, I wrote the following when reflecting on what I wanted to be doing:

Priorities (roughly ordered most to least important):

  • Keep living in Rangiora (family)
  • Up to moderate travel requirements – 4 trips a year + LCA/PyCon
  • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
  • Be doing something that matters
    • -> Being open source is one way to this, but not the only one
  • Something cutting edge would be awesome
    • -> Rust / Haskell / High performance requirements / scale / ….
  • Salary

How well did that work for me? Pretty good. I had a good satisfying job at VMware for 3 years, met some wonderful people, achieved some very cool things. And those priorities above were broadly achieved.
The one niggle that stands out was this – Did the things we were doing matter? Certainly there was no social impact – VMware isn’t a non-profit, being right at the core of capitalism as it is. There was direct connection and impact with the team, the staff we worked with and the users of the products… but it is just a bit hard to feel really connected through that though: VMware is a very large company and there are many layers between users and developers.

We were quite early adopters of Kubernetes, which allowed me to deepen my Go knowledge and experience some more fun with AWS scale operations. I had many interesting discussions about the relative strengths of Python Go and Rust and Java with colleagues there. (Hi Geoffrey).

Company culture is very important to me, and VMware has a fantastically supportive culture. One of the most supportive companies I’ve been in, bar none. It isn’t a truely remote-organised company though: rather its a bunch of offices that talk to each other, which I think is sad. True remote-first offers so much more engagement.

I enjoy building things to solve problems. I’ve either directly built, or shaped what is built, in all my most impactful and successful roles. Solving a problem once by hand is fine; solving it for years to come by creating a tool is far more powerful.

I seem to veer into toolmaking very often: giving other people the ability to solve their problems takes the power of a tool and multiplies it even further.

It should be no surprise then that I very much enjoy reading white papers like the original Dapper and Map-reduce ones, LinkedIn’s Kafka or for more recent fodder the Facebook Akkio paper. Excellent synthesis and toolmaking applied at industrial scale. I read those things and I want to be a part of the creation of those sorts of systems.

I was fortunate enough to take some time to go back to university part-time, which though logistically challenging is something I want to see through.

Thus I think my new roughly ordered (descending) list of priorities needs to be something like this:

  • Keep living in Rangiora (family)
  • Up to moderate travel requirements – 4 team-meeting trips a year + 2 conferences
  • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
  • Be doing something that matters
    • Be working directly on a problem / system that has problems
  • Something cutting edge would be awesome
    • Rust / Haskell / High performance requirements / scale / ….
  • A generative (Westrum definition) + supportive company culture
  • Remote-first or at least very remote familiar environment
  • Support my part time study / self improvement initiative
  • Salary

,

sthbrx - a POWER technical blogVisual Studio Code for Linux kernel development

Here we are again - back in 2016 I wrote an article on using Atom for kernel development, but I didn't stay using it for too long, instead moving back to Emacs. Atom had too many shortcomings - it had that distinctive Electron feel, which is a problem for a text editor - you need it to be snappy. On top of that, vim support was mediocre at best, and even as a vim scrub I would find myself trying to do things that weren't implemented.

So in the meantime I switched to spacemacs, which is a very well integrated "vim in Emacs" experience, with a lot of opinionated (but good) defaults. spacemacs was pretty good to me but had some issues - disturbingly long startup times, mediocre completions and go-to-definitions, and integrating any module into spacemacs that wasn't already integrated was a big pain.

After that I switched to Doom Emacs, which is like spacemacs but faster and closer to Emacs itself. It's very user configurable but much less user friendly, and I didn't really change much as my elisp-fu is practically non-existent. I was decently happy with this, but there were still some issues, some of which are just inherent to Emacs itself - like no actually usable inbuilt terminal, despite having (at least) four of them.

Anyway, since 2016 when I used Atom, Visual Studio Code (henceforth referred to as Code) came along and ate its lunch, using the framework (Electron) that was created for Atom. I did try it years ago, but I was very turned off by its Microsoft-ness, it seeming lack of distinguishing features from Atom, and it didn't feel like a native editor at all. Since it's massively grown in popularity since then, I decided I'd give it a try.

Visual Studio Code

Vim emulation

First things first for me is getting a vim mode going, and Code has a pretty good one of those. The key feature for me is that there's Neovim integration for Ex-commands, filling a lot of shortcomings that come with most attempts at vim emulation. In any case, everything I've tried to do that I'd do in vim (or Emacs) has worked, and there are a ton of options and things to tinker with. Obviously it's not going to do as much as you could do with Vimscript, but it's definitely not bad.

Theming and UI customisation

As far as the editor goes - it's good. A ton of different themes, you can change the colour of pretty much everything in the config file or in the UI, including icons for the sidebar. There's a huge sore point though, you can't customise the interface outside the editor pretty much at all. There's an extension for loading custom CSS, but it's out of the way, finnicky, and if I wanted to write CSS I wouldn't have become a kernel developer.

Extensibility

Extensibility is definitely a strong point, the ecosystem of extensions is good. All the language extensions I've tried have been very fully featured with a ton of different options, integration into language-specific linters and build tools. This is probably Code's strongest feature - the breadth of the extension ecosystem and the level of quality found within.

Kernel development

Okay, let's get into the main thing that matters - how well does the thing actually edit code. The kernel is tricky. It's huge, it has its own build system, and in my case I build it with cross compilers for another architecture. Also, y'know, it's all in C and built with make, not exactly great for any kind of IDE integration.

The first thing I did was check out the vscode-linux-kernel project by GitHub user "amezin", which is a great starting point. All you have to do is clone the repo, build your kernel (with a cross compiler works fine too), and run the Python script to generate the compile_commands.json file. Once you've done this, go-to-definition (gd in vim mode) works pretty well. It's not flawless, but it does go cross-file, and will pop up a nice UI if it can't figure out which file you're after.

Code has good built-in git support, so actions like staging files for a commit can be done from within the editor. Ctrl-P lets you quickly navigate to any file with fuzzy-matching (which is impressively fast for a project of this size), and Ctrl-Shift-P will let you search commands, which I've been using for some git stuff.

git command completion in Code

There are some rough edges, though. Code is set on what so many modern editors are set on, which is the "one window per project" concept - so to get things working the way you want, you would open your kernel source as the current project. This makes it a pain to just open something else to edit, like some script, or checking the value of something in firmware, or chucking something in your bashrc.

Auto-triggering builds on change isn't something that makes a ton of sense for the kernel, and it's not present here. The kernel support in the repo above is decent, but it's not going to get you close to what more modern languages can get you in an editor like this.

Oh, and it has a powerpc assembly extension, but I didn't find it anywhere near as good as the one I "wrote" for Atom (I just took the x86 one and switched the instructions), so I'd rather use the C mode.

Terminal

Code has an actually good inbuilt terminal that uses your login shell. You can bring it up with Ctrl-`. The biggest gripe I have always had with Emacs is that you can never have a shell that you can actually do anything in, whether it's eshell or shell or term or ansi-term, you try to do something in it and it doesn't work or clashes with some Emacs command, and then when you try to do something Emacs-y in there it doesn't work. No such issue is present here, and it's a pleasure to use for things like triggering a remote build or doing some git operation you don't want to do with commands in the editor itself.

Not the most important feature, but I do like not having to alt-tab out and lose focus.

Well...is it good?

Yeah, it is. It has shortcomings, but installing Code and using the repo above to get started is probably the simplest way to get a competent kernel development environment going, with more features than most kernel developers (probably) have in their editors. Code is open source and so are its extensions, and it'd be the first thing I recommend to new developers who aren't already super invested into vim or Emacs, and it's worth a try if you have gripes with your current environment.

,

Clinton RoyRestricted Sleep Regime

Since moving down to Melbourne my poor sleep has started up again. It’s really hard to say what the main factor driving this is. My doctor down here has put me onto a drug free way of trying to improve my sleep, and I think I kind of like it, while it’s no silver bullet, it is something I can go back to if I’m having trouble with my sleep, without having to get a prescription.

The basic idea is to maximise sleep efficiency. If you’re only getting n hours sleep a night, only spend n hours  a night in bed. This forces you to stay up and go to bed rather late for a few nights. Hopefully, being tired will help you sleep through the night in one large segment. Once you’ve successfully slept through the night a few times, relax your bed time by say fifteen minutes, and get used to that. Slowly over time, you increase the amount of sleep you’re getting, while keeping your efficiency high.

,

OpenSTEMElection Activity Bundle

With the upcoming federal election, many teachers want to do some related activities in class – and we have the materials ready for you! To make selecting suitable resources a bit easier, we have an Election Activity Bundle containing everything you need, available for just $9.90. Did you know that the secret ballot is an Australian […]

The post Election Activity Bundle first appeared on OpenSTEM Pty Ltd.

,

Jonathan AdamczewskiNavigation Mesh and Sunset Overdrive

Navigation mesh encodes where in the game world an agent can stand, and where it can go. (here “agent” means bot, actor, enemy, NPC, etc)

At runtime, the main thing navigation mesh is used for is to find paths between points using an algorithm like A*: https://en.wikipedia.org/wiki/A*_search_algorithm

In Insomniac’s engine, navigation mesh is made of triangles. Triangle edge midpoints define a connected graph for pathfinding purposes.

In addition to triangles, we have off-mesh links (“Custom Nav Clues” in Insomniac parlance) that describe movement that isn’t across the ground. These are used to represent any kind of off-mesh connection – could be jumping over a car or railing, climbing up to a rooftop, climbing down a ladder, etc. Exactly what it means for a particular type of bot is handled by clue markup and game code.

These links are placed by artists and designers in the game environment, and included in prefabs for commonly used bot-traversable objects in the world, like railings and cars.

Navigation mesh makes a certain operations much, much simpler than it would be if done by trying to reason about render or physics geometry.

Our game work is made up of a lot of small objects, which are each typically made from many triangles.

Using render or physics geometry to answer the question “can this bot stand here” hundreds of times every frame is not scalable. (Sunset Overdrive had 33ms frames. That’s not a lot of time.)

It’s much faster to ask: is there navigation mesh where this bot is

Navigation mesh is relatively sparse and simple, so the question can be answered quickly. We pre-compute bounding volumes for navmesh, to make answering that question even faster, and if a bot was standing on navmesh last frame, it’s even less work to reason about where they are this frame.

In addition to path-finding, navmesh can be useful to quickly and safely limit movement in a single direction. We sweep lines across navmesh to find boundaries to clamp bot movement. For example, a bot animating through a somersault will have its movement through the world clamped to the edge of navmesh, rather than rolling off into who-knows-what.

(If you’re making a game where you want bots to be able to freely somersault in any direction, you can ignore the navmesh ðŸ˜�)

Building navmesh requires a complete view of the static world. The generated mesh is only correct when it accounts for all objects: interactions between objects affect the generated mesh in ways that are not easy (or fast) to reason about independently.

Intersecting objects can become obstructions to movement. Or they can form new surfaces that an agent can stand upon. You can’t really tell what it means to an agent until you mash it all together.

To do as little work as possible at runtime, we required *all* of the static objects to be loaded at one time to pre-build mesh for Sunset City.

We keep that pre-built navmesh loading during the game at all times. For the final version of the game (with both of the areas added via DLC) this required ~55MB memory.

We use Recast https://github.com/recastnavigation/recastnavigation to generate the triangle mesh, and (mostly for historical reasons) repack this into our own custom format.

Sunset Overdrive had two meshes: one for “normal” humanoid-sized bots (2m tall, 0.5m radius)

and one for “large” bots (4.5m tall, 1.35m radius)

Both meshes are generated as 16x16m tiles, and use a cell size of 0.125m when rasterizing collision geometry.

There were a few tools used in Sunset Overdrive to add some sense of dynamism to the static environment:

For pathfinding and bot-steering, we have runtime systems to control bot movement around dynamic obstacles.

For custom nav clues, we keep track of whether they are in use, to make it less likely that multiple bots are jumping over the same thing at the same time. This can help fan-out groups of bots, forcing them to take distinctly different paths.

Since Sunset Overdrive, we’ve added a dynamic obstruction system based on Detour https://github.com/recastnavigation/recastnavigation to temporarily cut holes in navmesh for larger impermanent obstacles like stopped cars or temporary structures.

We also have a way to mark-up areas of navmesh so that they can be toggled in a controlled fashion from script. It’s less flexible than the dyanamic obstruction system – but it is very fast: toggling flags for tris rather than retriangulation.

I spoke about Sunset Overdrive at the AI Summit a few years back – my slide deck is here:
Sunset City Express: Improving the NavMesh Pipeline in Sunset Overdrive

I can also highly recommend @AdamNoonchester‘s talk from GDC 2015:
AI in the Awesomepocalypse – Creating the Enemies of Sunset Overdrive

Here’s some navigation mesh, using the default in-engine debug draw (click for larger version)

What are we looking at? This is a top-down orthographic view of a location in the middle of Sunset City.

The different colors indicate different islands of navigation mesh – groups of triangles that are reachable from other islands via custom nav clues.
Bright sections are where sections of navmesh overlap in the X-Z plane.

There are multiple visualization modes for navmesh.

Usually, this is displayed over some in-game geometry – it exists to debug/understand the data in game and editor. Depending on what the world looks like, some colors are easier to read than others. (click for larger versions)




The second image shows the individual triangles – adjacent triangles do not reliably have different colors. And there is stable color selection as the camera moves, almost ðŸ˜�

Also, if you squint, you can make out the 16x16m tile boundaries, so you can get a sense of scale.

Here’s a map of the entirety of Sunset City:

“The Mystery of the Mooil Rig” DLC area:

“Dawn of the Rise of the Fallen Machine” DLC area:

Referencing the comments from up-thread, these maps represent the places where agents can be. Additionally, there is connectivity information – we have visualization for that as well.

This image has a few extra in-engine annotations, and some that I added:

The purple lines represent custom nav clues – one line in each direction that is connected.

Also marked are some railings with clues placed at regular intervals, a car with clues crisscrossing it, and moored boats with clues that allow enemies to chase the player.

Also in this image are very faint lines on the mesh that show connectivity between triangles. When a bot is failing to navigate, it can be useful to visualize the connectivity that the mesh thinks it has :)

The radio tower where the fight with Fizzie takes place:

The roller coaster:

The roller coaster tracks are one single, continuous and complete island of navmesh.

Navigation mesh doesn’t line up neatly with collision geometry, or render geometry. To make it easier to see, we draw it offset +0.5m up in world-space, so that it’s likely to be above the geometry it has been generated for. (A while ago, I wrote a full-screen post effect that drew onto rendered geometry based on proximity to navmesh. I thought it was pretty cool, and it was nicely unambiguous & imho easier to read – but I never finished it, it bitrot, and I never got back to it alas.)

Since shipping Sunset Overdrive, we added support for keeping smaller pieces of navmesh in memory – they’re now loaded in 128x128m parts, along with the rest of the open world.

@despair‘s recent technical postmortem has a little more on how this works:
‘Marvel’s Spider-Man’: A Technical Postmortem

Even so, we still load it all of an open world region to build the navmesh: the asset pipeline doesn’t provide information that is needed to generate navmesh for sub-regions efficiently & correctly, so it’s all-or-nothing. (I have ideas on how to improve this. One day…)

Let me know if you have any questions – preferably via twitter @twoscomplement

 

This post was originally a twitter thread:

,

Chris SamuelVale Polly Samuel (1963-2017): On Dying & Death

Those of you who follow me on Twitter will know some of this already, but I’ve been meaning to write here for quite some time about all this. It’s taken me almost two years to write, because it’s so difficult to find the words to describe this. I’ve finally decided to take the plunge and finish it despite feeling it could be better, but if I don’t I’ll never get this out.

September 2016 was not a good month for my wonderful wife Polly, she’d been having pains around her belly and after prodding the GP she managed to get a blood test ordered. They had suspected gallstones or gastritis but when the call came one evening to come in urgently in the next morning for another blood test we knew something was up. After the blood test we were sent off for an ultrasound of the liver and with that out of the way went out for a picnic on Mount Dandenong for a break. Whilst we were eating we got another phone call from the GP, this time to come and pick up a referral for an urgent MRI. We went to pick it up but when they found out Polly had already eaten they realised they would need to convert to a CT scan. A couple of phone calls later we were booked in for one that afternoon. That evening was another call to come back to see the GP. We were pretty sure we knew what was coming.

The news was not good, Polly had “innumerable” tumours in her liver. Over 5 years after surgery and chemo for her primary breast cancer and almost at the end of her 5 years of tamoxifen the cancer had returned. We knew the deal with metastatic cancer, but it was still a shock when the GP said “you know this is not a curable situation”. So the next day (Friday) it was right back to her oncologist who took her of the tamoxifen immediately (as it was no longer working) and scheduled chemotherapy for the following Monday, after an operation to install a PICC line. He also explained about what this meant, that this was just a management technique to (hopefully) try and shrink the tumours and make life easier for Polly for a while. It was an open question about how long that while would be, but we knew from the papers online that she had found looking at the statistics that it was likely months, not years, that we had. Polly wrote about it all at the time, far more eloquently than I could, with more detail, on her blog.

Chris, my husband, best pal, and the love of my life for 17 years, and I sat opposite the oncologist. He explained my situation was not good, that it was not a curable situation. I had already read that extensive metastatic spread to the liver could mean a prognosis of 4-6 months but if really favorable as long as 20 months.

The next few months were a whirlwind of chemo, oncology, blood tests, crying, laughing and loving. We were determined to talk about everything, and Polly was determined to prepare as quickly as she could for what was to come. They say you should “put your affairs in order” and that’s just what she did, financially, business-wise (we’d been running an AirBNB and all those bookings had to be canceled ASAP, plus of course all her usual autism consulting work) and personally. I was so fortunate that my work was so supportive and able to be flexible about my hours and days and so I could be around for these appointments.

Over the next few weeks it was apparent that the chemo was working, breathing & eating became far easier for her and a follow up MRI later on showed that the tumours had shrunk by about 75%. This was good news.

In October 2016 was Polly’s 53rd birthday and so she set about planning a living wake for herself, with a heap of guests, music courtesy of our good friend Scott, a lot of sausages (and other food) and good weather. Polly led the singing and there was an awful lot of merriment. Such a wonderful time and such good memories were made that day.

Polly singing at her birthday party in 2016

That December we celebrated our 16th wedding anniversary together at a lovely farm-stay place in the Yarra Valley before having what we were pretty sure was our last Christmas together.

Polly and Chris at the farm-stay for our wedding anniversary

But then in January came the news we’d been afraid of, the blood results were showing that the first chemo had run out of steam and stopped working, so it was on to chemo regime #2. A week after starting the new regime we took a delayed holiday up to the Blue Mountains in New South Wales (we’d had to cancel previously due to her diagnosis) and spent a long weekend exploring the area and generally having fun.

Polly and Chris at Katoomba, NSW

But in early February it was clear that the second line chemo wasn’t doing anything, and so it was on to the third line chemo. Polly had also been having fluid build up in her abdomen (called ascites) and we knew they would have to start to draining that at some point, February was that point; we spent the morning of Valentines Day in the radiology ward where they drained around 4 litres from her! The upside from that was it made life so much easier again for her. We celebrated that by going to a really wonderful restaurant that we used for special events for dinner for Valentines, something we hadn’t thought possible that morning!

Valentine's Day dinner at Copperfields

Two weeks after that we learned from the oncologist that the third line chemo wasn’t doing anything either and he had to give us the news that there wasn’t any treatment he could offer us that had any prospect of helping. Polly took that in her usual pragmatic and down-to-earth way, telling the oncologist that she didn’t see him as the reaper but as her fairy godfather who had given her months of extra quality time and bringing a smile to his and my face. She also asked whether the PICC line (which meant she couldn’t have a bath, just shower with a protective cover over it) could come out and the answer was “yes”.

The day before that news we had visited the palliative ward there for the first time, Polly had a hard time with hospitals and so we spent time talking to the staff, visiting rooms and Polly all the time reframing it to reduce and remove the anxiety. The magic words were “hotel-hospital”, which it really did feel like. We talked with the oncologist about how it all worked and what might happen.

We also had a home palliative team who would come and visit, help with pain management and be available on the phone at all hours to give advice and assist where they could. Polly felt uncertain about them at first as she wasn’t sure what they would make of her language issues and autism, but whilst they did seem a bit fazed at first by someone who was dealing with the fact that they were dying in such a blunt and straightforward manner things soon smoothed out.

None of this stopped us living, we continued to go out walking at our favourite places in our wonderful part of Melbourne, continued to see friends, continued to joke and dance and cry and laugh and cook and eat out.

Polly on minature steam train

Oh, and not forgetting putting a new paved area in so we could have a little outdoor fire area to enjoy with friends!

Chris laying paving slabs for fire area Polly and Morghana enjoying the fire!

But over time the ascites was increasing, with each drain being longer, with more fluid, and more taxing for Polly. She had decided that when it would get to the point that she would need two a week then that was enough and time to call it a day. Then, on a Thursday evening after we’d had an afternoon laying paving slabs for another little patio area, Polly was having a bath whilst I was researching some new symptoms that had appeared, and when Polly emerged I showed her what I had found. The symptoms matched what happens when that pressure that causes the ascites gets enough to push blood back down other pathways and as we read what else could lie in store Polly decided that was enough.

That night Polly emailed the oncologist to ask them to cancel her drain which was scheduled for the next day and instead to book her into the palliative ward. We then spent our final night together at home, before waking the next day to get the call to confirm that all was arranged from their end and that they would have a room by 10am, but to arrive when was good for us. Friends were informed and Polly and I headed off to the palliative ward, saying goodbye to the cats and leaving our house together for the very last time.

Arriving at the hospital we dropped in to see the oncology, radiology and front-desk staff we knew to chat with them before heading up to the palliative ward to meet the staff there and set up in the room. The oncologist visited and we had a good chat about what would happen with pain relief and sedation once Polly needed it. Shortly after our close friends Scott and Morghana arrived from different directions and I had brought Polly’s laptop and a 4G dongle and so on Skype arrived Polly’s good Skype pal Marisol joined us, virtually. We shared a (dairy free) Easter egg, some raspberry lemonade and even some saké! We had brought in a portable stereo and CD’s and danced and sang and generally made merry – which was so great.

After a while Polly decided that she was too uncomfortable and needed the pain relief and sedation, so everything was put in its place and we all said our goodbyes to Polly as she was determined to do the final stages on her own, and she didn’t want anyone around in case it caused her to try and hang on longer than she really should. I told her I was so proud of her and so honoured to be her husband for all this time. Then we left, as she wished, with Scott and Morghana coming back with me to the house. We had dinner together at the house and then Morghana left for home and Scott kindly stayed in the spare room.

The next day Scott and I returned to the hospital, Polly was still sleeping peacefully so after a while he and I had a late lunch together, making sure to fulfil Polly’s previous instructions to go enjoy something that she couldn’t, and then we went our separate ways. I had not been home long before I got the call from the hospital – Polly was starting to fade – so I contacted Scott and we both made our way back there again. The staff were lovely, they managed to rustle up some food for us as well as tea and coffee and would come and check on us in the waiting lounge, next door to where Polly was sleeping. At one point the nurse came in and said “you need a hug, she’s still sleeping”. Then, a while after, she came back in and said “I need a hug, she’s gone…”.

I was bereft. Whilst intellectually I knew this was inevitable, the reality of knowing that my life partner of 17 years was gone was so hard. The nurse told me us that we could see Polly now, and so Scott and I went to see her to say our final goodbye. She was so peaceful, and I was grateful that things had gone as she wanted and that she had been able to leave on her own terms and without the greater discomforts and pain that she was worried would still be coming. Polly had asked us to leave a CD on, and as we were leaving the nurses said to us “oh, we changed the CD earlier on today because it seemed strange to just have the one on all the time. We put this one on by someone called ‘Donna Williams’, it was really nice.”. So they had, unknowingly, put her own music on to play her out.

As you would expect if you had ever met Polly she had put all her affairs in order, including making preparations for her memorial as she wanted to make things as easy for me as possible. I arranged to have it live streamed for friends overseas and as part of that I got a recording of it, which I’m now making public below. Very sadly her niece Jacqueline, who talks at one point about going ice skating with her, has also since died.

Polly and I were so blessed to have 16 wonderful years together, and even at the end the fact that we did not treat death as a taboo and talked openly and frankly about everything (both as a couple and with friends) was such a boon for us. She made me such a better person and will always be part of my life, in so many ways.

Finally, I leave you with part of Polly’s poem & song “Still Awake”..

Time is a thief, which steals the chances that we never get to take.
It steals them while we are asleep.
Let’s make the most of it, while we are still awake.

Polly at Cardinia Reservoir, late evening

This item originally posted here:

Vale Polly Samuel (1963-2017): On Dying & Death

,

sthbrx - a POWER technical blogArticle Review: Curing the Vulnerable Parser

Every once in a while I read papers or articles. Previously, I've just read them myself, but I was wondering if there were more useful things I could do beyond that. So I've written up a summary and my thoughts on an article I read - let me know if it's useful!

I recently read Curing the Vulnerable Parser: Design Patterns for Secure Input Handling (Bratus, et al; USENIX ;login: Spring 2017). It's not a formal academic paper but an article in the Usenix magazine, so it doesn't have a formal abstract I can quote, but in short it takes the long history of parser and parsing vulnerabilities and uses that as a springboard to talk about how you could design better ones. It introduces a toolkit based on that design for more safely parsing some binary formats.

Background

It's worth noting early on that this comes out of the LangSec crowd. They have a pretty strong underpinning philosophy:

The Language-theoretic approach (LANGSEC) regards the Internet insecurity epidemic as a consequence of ad hoc programming of input handling at all layers of network stacks, and in other kinds of software stacks. LANGSEC posits that the only path to trustworthy software that takes untrusted inputs is treating all valid or expected inputs as a formal language, and the respective input-handling routines as a recognizer for that language. The recognition must be feasible, and the recognizer must match the language in required computation power.

A big theme in this article is predictability:

Trustworthy input is input with predictable effects. The goal of input-checking is being able to predict the input’s effects on the rest of your program.

This seems sensible enough at first, but leads to some questionable assertions, such as:

Safety is predictability. When it's impossible to predict what the effects of the input will be (however valid), there is no safety.

They follow this with an example of Ethereum contracts stealing money from the DAO. The example is compelling enough, but again comes with a very strong assertion about the impossibility of securing a language virtual machine:

From the viewpoint of language-theoretic security, a catastrophic exploit in Ethereum was only a matter of time: one can only find out what such programs do by running them. By then it is too late.

I'm not sure that (a) I buy the assertions, or that (b) they provide a useful way to deal with the world as we find it.

Is this even correct?

You can tease out 2 contentions in the first part of the article:

  • there should be a formal language that describes the data, and
  • this language should be as simple as possible, ideally being regular and context-free.

Neither of these are bad ideas - in fact they're both good ideas - but I don't know that I draw the same links between them and security.

Consider PostScript as a possible counter-example. It's a Turing-complete language, so it absolutely cannot have predictable results. It has a well documented specification and executes in a restricted virtual machine. So let's say that it satisfies only the first plank of their argument.

I'd say that PostScript has a good security record, despite being Turing complete. PostScript has been around since 1985 and apart from the recent bugs in GhostScript, it doesn't have a long history of bugs and exploits. Maybe this just because no-one has really looked, or maybe it is possible to have reasonably safe complex languages by restricting the execution environment, as PostScript consciously and deliberately does.

Indeed, if you consider the recent spate of GhostScript bugs, perhaps some may be avoided by stricter compliance with a formal language specification. However, most seem to me to arise from the desirability of implementing some of the PostScript functionality in PostScript itself, and some of the GhostScript-specific, stupendously powerful operators exposed to the language to enable this. The bugs involve tricks to allow a user to get access to these operators. A non-Turing-complete language may be sufficient to prevent these attacks, but it is not necessary: just not doing this sort of meta-programming with such dangerous operators would also have worked. Storing the true values of the security state outside of a language-accessible object would also be good.

Is this a useful way to deal with the world as we find it?

My main problem with the general LangSec approach that this article takes is this: to get to their desired world, we need to rewrite a bunch of things with entirely different language foundations. The article talks about HTML and PDFs as examples of unsafe formats, but I cannot imagine the sudden wholesale replacement of either of these - although I would love to be proven wrong.

Can we get even part of the way with existing standards? Kinda-sorta, but mostly no, and to the authors' credit, they are open about this. They argue that formal definition parsing the language should be the "most restrictive input definition" - they specifically require you to "give up attempting to accept arbitrarily complex data", and call for "subsetting of many protocols, formats, encodings and command languages, including eliminating unneeded variability and introducing determinism and static values".

No doubt we would be in a better place if people took up these ideas for future programs. However, for our current set of programs and use cases, this is probably not tractable in any meaningful way.

The rest of the paper

The rest of the paper is reasonably interesting. Their general theory is that you should build your parsers based on a formal definition of a language, and that the parser should convert the input data to a set of objects, and then your business logic should deal with those objects. This is the 'recognizer pattern', and is illustrated below:

The recognizer pattern: separate code parses input according to a formal grammar, creating valid objects that are passed to the business logic

In short, the article is full of great ideas if you happen to be parsing a simple language, or are designing not just a parser but a full language ecosystem. They do also provide a binary parser toolkit that might be helpful if you are parsing a binary format that can be expressed with a parser combinator.

Overall, however, I think the burden of maintaining old systems is such that a security paradigm that relies on new code is pretty unlikely, and one that relies on new languages is fatally doomed from the outset. New systems should take up these ideas, yes. But I'd really like to see people grappling with how to deal with the complex and irregular languages that we're stuck with (HTML, PDF, etc) in secure ways.

,

Tim SerongHerringback

It occurs to me that I never wrote up the end result of the support ticket I opened with iiNet after discovering significant evening packet loss on our fixed wireless NBN connection in August 2017.

The whole saga took about a month. I was asked to run a battery of tests (ping, traceroute, file download and speedtest, from a laptop plugged directly into the NTD) three times a day for three days, then send all the results in so that a fault could be lodged. I did this, but somehow there was a delay in the results being communicated, so that by the time someone actually looked at them, they were considered stale, and I had to run the whole set of tests all over again. It’s a good thing I work from home, because otherwise there’s no way it would be possible to spend half an hour three times a day running tests like this. Having finally demonstrated significant evening slowdowns, a fault was lodged, and eventually NBN Co admitted that there was congestion in the evenings.

We have investigated and the cell which this user is connected to experiences high utilisation during busy periods. This means that the speed of this service is likely to be reduced, particularly in the evening when more people are using the internet.

nbn constantly monitors the fixed wireless network for sites which require capacity expansion and we aim to upgrade site capacity before congestion occurs, however sometimes demand exceeds expectations, resulting in a site becoming congested.

This site is scheduled for capacity expansion in Quarter 4, 2017 which should result in improved performance for users on the site. While we endeavour to upgrade sites on their scheduled date, it is possible for the date to change.

I wasn’t especially happy with that reply after a support experience that lasted for a month, but some time in October that year, the evening packet loss became less, and the window of time where we experienced congestion shrank. So I guess they did do some sort of capacity expansion.

It’s been mostly the same since then, i.e. slower in the evenings than during the day, but, well, it could be worse than it is. There was one glitch in November or December 2018 (poor speed / connection issues again, but this time during the day) which resulted in iiNet sending out a new router, but I don’t have a record of this, because it was a couple of hours of phone support that for some reason never appeared in the list of tickets in the iiNet toolbox, and even if it had, once a ticket is closed, it’s impossible to click it to view the details of what actually happened. It’s just a subject line, status and last modified date.

Fast forward to Monday March 25 2019 – a day with a severe weather warning for damaging winds – and I woke up to 34% packet loss, ping times all over the place (32-494ms), continual disconnections from IRC and a complete inability to use a VPN connection I need for work. I did the power-cycle-everything dance to no avail. I contemplated a phone call to support, then tethered my laptop to my phone instead in order to get a decent connection, and decided to wait it out, confident that the issue had already been reported by someone else after chatting to my neighbour.

hideous-packet-loss-march-2019

Tuesday morning it was still horribly broken, so I unplugged the router from the NTD, plugged a laptop straight in, and started running ping, traceroute and speed tests. Having done that I called support and went through the whole story (massive packet loss, unusable connection). They asked me to run speed tests again, almost all of which failed immediately with a latency error. The one that did complete showed about 8Mbps down, compared to the usual ~20Mbps during the day. So iiNet lodged a fault, and said there was an appointment available on Thursday for someone to come out. I said fine, thank you, and plugged the router back in to the NTD.

Curiously, very shortly after this, everything suddenly went back to normal. If I was a deeply suspicious person, I’d imagine that because I’d just given the MAC address of my router to support, this enabled someone to reset something that was broken at the other end, and fix my connection. But nobody ever told me that anything like this happened; instead I received a phone call the next day to say that the “speed issue” I had reported was just regular congestion and that the tower was scheduled for an upgrade later in the year. I thanked them for the call, then pointed out that the symptoms of this particular issue were completely different to regular congestion and that I was sure that something had actually been broken, but I was left with the impression that this particular feedback would be summarily ignored.

I’m still convinced something was broken, and got fixed. I’d be utterly unsurprised if there had been some problem with the tower on the Sunday night, given the strong winds, and it took ’til mid-Tuesday to get it sorted. But we’ll never know, because NBN Co don’t publish information about congestion, scheduled upgrades, faults and outages anywhere the general public can see it. I’m not even sure they make this information consistently available to retail ISPs. My neighbour, who’s with a different ISP, sent me a notice that says there’ll be maintenance/upgrades occurring on April 18, then again from April 23-25. There’s nothing about this on iiNet’s status page when I enter my address.

There was one time in the past few years though, when there was an outage that impacted me, and it was listed on iiNet’s status page. It said “customers in the area of Herringback may be affected”. I initially didn’t realise that meant me, as I’d never heard for a suburb, region, or area called Herringback. Turns out it’s the name of the mountain our NBN tower is on.

,

Robert CollinsContinuous Delivery and software distributors

Back in 2010 the continuous delivery meme was just grabbing traction. Today its extremely well established… except in F/LOSS projects.

I want that to change, so I’m going to try and really bring together a technical view on how that could work – which may require multiple blog posts – and if it gets traction I’ll put my fingers where my thoughts are and get into specifics with any project that wants to do this.

This is however merely a worked model today: it may be possible to do things quite differently, and I welcome all discussion about the topic!

tl;dr

Pick a service discovery mechanism (e.g. environment variables), write two small APIs – one for flag delivery, with streaming updates, and one for telemetry, with an optional aggressive data hiding proxy, then use those to feed enough data to drive a true CI/CD cycle back to upstream open source projects.

Who is in?

Background

(This assumes you know what C/D is – if you don’t, go read the link above, maybe wikipedia etc, then come back.)

Consider a typical SaaS C/D pipeline:

git -> build -> test -> deploy

Here all stages are owned by the one organisation. Once deployed, the build is usable by users – its basically the simplest pipeline around.

Now consider a typical on-premise C/D pipeline:

git -> build -> test -> expose -> install

Here the last stage, the install stage, takes place in the users context, but it may be under the control of the create, or it may be under the control of the user. For instance, Google play updates on an Android phone: when one selects ‘Update Now’, the install phase is triggered. Leaving the phone running with power and Wi-Fi will trigger it automatically, and security updates can be pushed anytime. Continuing the use of Google Play as an example, the expose step here is an API call to upload precompiled packages, so while there are three parties, the distributor – Google – isn’t performing any software development activities (they do gatekeep, but not develop).

Where it gets awkward is when there are multiple parties doing development in the pipeline.

Distributing and C/D

Lets consider an OpenStack cloud underlay circa 2015: an operating system, OpenStack itself, some configuration management tool (or tools), a log egress tool, a metrics egress handler, hardware mgmt vendor binaries. And lets say we’re working on something reasonably standalone. Say horizon.

OpenStack for most users is something obtained from a vendor. E.g. Cisco or Canonical or RedHat. And the model here is that the vendor is responsible for what the user receives; so security fixes – in particular embargoed security fixes – cannot be published publically and the slowly propogate. They must reach users very quickly. Often, ideally, before the public publication.

Now we have something like this:

upstream ends with distribution, then vendor does an on-prem pipeline


Can we not just say ‘the end of the C/D pipeline is a .tar.gz of horizon at the distribute step? Then every organisation can make their own decisions?

Maybe…

Why C/D?

  • Lower risk upgrades (smaller changes that can be reasoned about better; incremental enablement of new implementations to limit blast radius, decoupling shipping and enablement of new features)
  • Faster delivery of new features (less time dealing with failed upgrades == more time available to work on new features; finished features spend less time in inventory before benefiting users).
  • Better code hygiene (the same disciplines needed to make C/D safe also make more aggressive refactoring and tidiness changes safer to do, so it gets done more often).

1. If the upstream C/D pipeline stops at a tar.gz file, the lower-risk upgrade benefit is reduced or lost: the pipeline isn’t able to actually push all the to installation, and thus we cannot tell when a particular upgrade workaround is no longer needed.

But Robert, that is the vendors problem!

I wish it was: in OpenStack so many vendors had the same problem they created shared branches to work on it, then asked for shared time from the project to perform C/I on those branches. The benefit is only realise when the developer who is responsible for creating the issue can fix it, and can be sure that the fix has been delivered; this means either knowing that every install will install transiently every intermediary version, or that they will keep every workaround for every issue for some minimum time period; or that there will be a pipeline that can actually deliver the software.

2. .tar.gz files are not installed and running systems. A key characteristic of a C/D pipeline is that is exercises the installation and execution of software; the ability to run a component up is quite tightly coupled to the component itself, for all the the ‘this is a process’ interface is very general, the specific ‘this is server X’ or ‘this is CLI utility Y’ interfaces are very concrete. Perhaps a container based approach, where a much narrower interface in many ways can be defined, could be used to mitigate this aspect. Then even if different vendors use different config tools to do last mile config, the dev cycle knows that configuration and execution works. We need to make sure that we don’t separate the teams and their products though: the pipeline upstream must only test code that is relevant to upstream – and downstream likewise. We may be able to find a balance here, but I think more work articulating what that looks like it needed.

3. it will break the feedback cycle if the running metrics are not receive upstream; yes we need to be careful of privacy aspects, but basic telemetry: the upgrade worked, the upgrade failed, here is a crash dump – these are the tools for sifting through failure at scale, and a number of open source projects like firefox, Ubuntu and chromium have adopted them, with great success. Notably all three have direct delivery models: their preference is to own the relationship with the user and gather such telemetry directly.

C/D and technical debt

Sidebar: ignoring public APIs and external dependencies, because they form the contract that installations and end users interact with, which we can reasonably expect to be quite sticky, the rest of a system should be entirely up to the maintainers right? Refactor the DB; Switch frameworks, switch languages. Cleanup classes and so on. With microservices there is a grey area: APIs that other microservices use which are not publically supported.

The grey area is crucial, because it is where development drag comes in: anything internal to the system can be refactored in a single commit, or in a series of small commits that is rolled up into one, or variations on this theme.

But some aspect that another discrete component depends upon, with its own delivery cycle: that cannot be fixed, and unless it was built with the same care public APIs were, it may well have poor scaling or performance characteristics that making fixing it very important.

Given two C/D’d components A and B, where A wants to remove some private API B uses, A cannot delete that API from its git repo until all B’s everywhere that receive A via C/D have been deployed with a version that does not use the private API.

That is, old versions of B place technical debt on A across the interfaces of A that they use. And this actually applies to public interfaces too – even if they are more sticky, we can expect the components of an ecosystem to update to newer APIs that are cheaper to serve, and laggards hold performance back, keep stale code alive in the codebase for longer and so on.

This places a secondary requirement on the telemetry: we need to be able to tell whether the fleet is upgraded or not.

So what does a working model look like?

I think we need a different diagram than the pipeline; the pipeline talks about the things most folk doing an API or some such project will have directly in hand, but its not actually the full story. The full story is rounded out with two additional features. Feature flags and telemetry. And since we want to protect our users, and distributors probably will simply refuse to provide insights onto actual users, lets assume a near-zero-trust model around both.

Feature flags

As I discussed in my previous blog post, feature flags can be used for fairly arbitrary purposes, but in this situation, where trust is limited, I think we need to identify the crucial C/D enabling use cases, and design for them.

I think that those can be reduce to soft launches – decoupling activating new code paths from getting them shipped out onto machines, and kill switches – killing off flawed / faulty code paths when they start failing in advance of a massive cascade failure; which we can implement with essentially the same thing: some identifier for a code path and then a percentage of the deployed base to enable it on. If we define this API with efficient streaming updates and a consistent service discovery mechanism for the flag API, then this could be replicated by vendors and other distributors or even each user, and pull the feature API data downstream in near real time.

Telemetry

The difficulty with telemetry APIs is that they can egress anything. OTOH this is open source code, so malicious telemetry would be visible. But we can structure it to make it harder to violate privacy.

What does the C/D cycle need from telemetry, and what privacy do we need to preserve?

This very much needs discussion with stakeholders, but at a first approximation: the C/D cycle depends on knowing what versions are out there and whether they are working. It depends on known what feature flags have actually activated in the running versions. It doesn’t depend on absolute numbers of either feature flags or versions

Using Google Play again as an example, there is prior art – https://support.google.com/firebase/answer/6317485 – but I want to think truely minimally, because the goal I have is to enable C/D in situations with vastly different trust levels than Google play has. However, perhaps this isn’t enough, perhaps we do need generic events and the ability to get deeper telemetry to enable confidence.

That said, let us sketch what an API document for that might look like:

project:
version:
health:
flags:
- name:
  value:

If that was reported by every deployed instance of a project, once per hour, maybe with a dependencies version list added to deal with variation in builds, it would trivially reveal the cardinality of reporters. Many reporters won’t care (for instance QA testbeds). Many will.

If we aggregate through a cardinality hiding proxy, then that vector is addressed – something like this:

- project:
  version:
  weight:
  health:
  flags:
  - name:
    value:
- project: ...

Because this data is really only best effort, such a proxy could be backed by memcache or even just an in-memory store, depending on what degree of ‘cloud-nativeness’ we want to offer. It would receive accurate data, then deduplicate to get relative weights, round those to (say) 5% as a minimum to avoid disclosing too much about long tail situations (and yes, the sum of 100 1% reports would exceed 100 :)), and then push that up.

Open Questions

  • Should library projects report, or are they only used in the context of an application/service?
    • How can we help library projects answer questions like ‘has every user stopped using feature Y so that we can finally remove it’ ?
  • Would this be enough to get rid of the fixation on using stable branches everyone seems to have?
    • If not why not?
  • What have I forgotten?

,

Robert CollinsFeature flags

Feature toggles, feature flags – they’ve been written about a lot already (use a search engine :)), yet I feel like writing a post about them. Why? I’ve been personally involved in two from-scratch implementations, and it may be interesting for folk to read about that.

I say that lots has been written; http://featureflags.io/ (which appears to be a bit of an astroturf site for LaunchDarkly 😉 ) nevertheless has gathered a bunch of links to literature as well as a number of SDKs and the like; there are *other* FFaaS offerings than LaunchDarkly; I have no idea which I would use for my next project at this point – but hopefully you’ll have some tools to reason about that at the end of this piece.

I’m going to entirely skip over the motivation (go read those other pieces), other than to say that the evidence is in, trunk based development is better.



Humble, J. and Kim, G., 2018. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution.

A feature flag is a very simple thing: it is a value controlled outside of your development cycle that in turn controls the behaviour of your code. There are dozens of ways to implement that. hash-defines and compile time flags have been used for a very long time, so long that we don’t even think of them as feature flags, but they are are. So are configuration options in configuration files in the broadest possible sense. The difference is largely in focus, and where the same system meets all parties needs, I think its entirely fine to use just the one system – that is what we did for Launchpad, and it worked quite well I think – as far as I know it hasn’t been changed. Specifically in Launchpad the Zope runtime config is regular ZCML files on disk, and feature flags are complementary to that (but see the profiling example below).

Configuration tends to be thought of as “choosing behaviour after the system is compiled and before the process is started” – e.g. creating files on disk. But this is not always the case – some enterprise systems are notoriously flexible with database managed configuration rulesets which no-one can figure out – and we don’t want to create that situation.

Lets generalise things a little – a flag could be configured over the lifetime of the binary (compile flag), execution (runtime flag/config file or one-time evaluation of some dynamic system), time(dynamically reconfigured from changed config files or some dynamic system), or configured based on user/team/organisation, URL path (of a web request naturally :P), and generally any other thing that could be utilised in making a decision about whether to conditionally perform some code. It can also be useful to be able to randomly bucket some fraction of checks (e.g. 1/3 of all requests will go down this code path).. but do it consistently for the same browser.

Depending on what sort of system you are building, some of those sorts of scopes will be more or less important to you – for instance, if you are shipping on-premise software, you may well want to be turning unreleased features entirely off in the binary. If you are shipping a web API, doing soft launches with population rollouts and feature kill switches may be your priority.

Similarly, if you have an existing microservices architecture, having a feature flags aaS API is probably much more important (so that your different microservices can collaborate on in-progress features!) than if you have a monolithic DB where you put all your data today.

Ultimately you will end up with something that looks roughly like a key-value store: get_flag_value(flagname, context) -> value. Somewhere separate to your code base you will have a configuration store where you put rules that define how that key-value interface comes to a given value.

There are a few key properties that I consider beneficial in a feature flag systems:

  • Graceful degradation
  • Permissionless / (or alternatively namespaced)
  • Loosely typed
  • Observable
  • Centralised
  • Dynamic

Graceful Degradation

Feature flags will be consulted from all over the place – browser code, templates, DB mapper, data exporters, test harnesses etc. If the flag system itself is degraded, you need the systems behaviour to remain graceful, rather than stopping catastrophically. This often requires multiple different considerations; for instance, having sensible defaults for your flags (choose a default that is ok, change the meaning of defaults as what is ‘ok’ changes over time), having caching layers to deal with internet flakiness or API blips back to your flag store, making sure you have memory limits on local caches to prevent leaks and so forth. Different sorts of flag implementations have different failure modes : an API based flag system will be quite different to one stored in the same DB the rest of your code is using, which will be different to a process-startup command line option flag system.

A second dimension where things can go wrong is dealing with missing or unexpected flags. Remember that your system changes over time: a new flag in the code base won’t exist in the database until after the rollout, and when a flag is deleted from the codebase, it may still be in your database. Worse, if you have multiple instances running of a service, you may have different code all examining the same flag at the same time, so operations like ‘we are changing the meaning of a flag’ won’t take place atomically.

Permissions

Flags have dual audiences; one part is pure dev: make it possible to keep integration costs and risks low by merging fully integrated code on a continual basis without activating not-yet-ready (or released!) codepaths. The second part is pure operations: use flags to control access to dark launches, demo new features, killswitch parts of the site during attack mitigation, target debug features to staff and so forth.

Your developers need some way to add and remove the flags needed in their inner loop of development. Lifetimes of a few days for some flags.

Whoever is doing operations on prod though, may need some stronger guarantees – particularly they may need some controls over who can enable what flags. e.g. if you have a high control environment then team A shouldn’t be able to influence team B’s flags. One way is to namespace the flags and only permit configuration for the namespace a developer’s team(s) owns. Another way is to only have trusted individuals be able to set flags – but this obviously adds friction to processes.

Loose Typing

Some systems model the type of each flag: is it boolean, numeric, string etc. I think this is a poor idea mainly because it tends to interact poorly with the ephemeral nature of each deployment of a code base. If build X defines flag Y as boolean, and build X+1 defines it as string, the configuration store has to interact with both at the same time during rollouts, and do so gracefully. One way is to treat everything as a string and cast it to the desired type just in time, with failures being treated as default.

Observable

Make sure that when a user reports crazy weird behaviour, that you can figure out what value they had for what flags. For instance, in Launchpad we put them in the HTML.

Centralised

Having all your flags in one system lets you write generic tooling – such as ‘what flags are enabled in QA but not production’, or ‘what flags are set but have not been queried in the last month’. It is well worth the effort to build a single centralised system (or consume one such thing) and then use it everywhere. Writing adapters to different runtimes is relatively low overhead compared to rummaging through N different config systems because you can’t remember which one is running which platform.

Scope things in the system with a top level tenant / project style construct (any FFaaS will have this I’m sure :)).

Dynamic

There may be some parts of the system that cannot apply some flags rapidly, but generally speaking the less poking around that needs to be done to make something take effect the better. So build or buy a dynamic system, and if you want a ‘only on process restart’ model for some bits of it, just consult the dynamic system at the relevant time (e.g. during k8s Deployment object creation, or process startup, or …). But then everywhere else, you can react just-in-time; and even make the system itself be script driven.

The Launchpad feature flag system

I was the architect for Launchpad when the flag system was added. Martin Pool wanted to help accelerate feature development on Launchpad, and we’d all become aware of the feature flag style things hip groups like YouTube were doing; so he wrote a LEP: https://dev.launchpad.net/LEP/FeatureFlags , pushed that through our process and then turned it into code and docs (and once the first bits landed folk started using and contributing to it). Here’s a patch I wrote using the system to allow me to turn on Python profiling remotely. Here’s one added by William Grant to allow working around a crash in a packaging tool.

Launchpad has a monolithic data store, with bulk data federated out to various disk stores, but all relational data in one schema; we didn’t see much benefit in pushing for a dedicated API per se at that time – it can always be added later, as the design was deliberately minimal. The flags implementation is all in-process as a result, though there may be a JS thunk at this point – I haven’t gone looking. Permissions are done through trusted staff members, it is loosely typed and has an audit log for tracking changes.

The other one

The other one I was involved in was at VMware a couple years ago now; its in-house, but some interesting anecdotes I can share. The thinking on feature flags when I started the discussion was that they were strictly configuration file settings – I was still finding my feet with the in-house Xenon framework at the time (I think this was week 3 ? 🙂 so I whipped up an API specification and a colleague (Tyler Curtis) turned that into a draft engine; it wasn’t the most beautiful thing but it was still going strong and being enhanced by the team when I left earlier this year. The initial implementation had a REST API and a very basic set of scopes. That lasted about 18 months before tenant based scopes were needed and added. I had designed it with the intent of adding multi-arm bandit selection down the track, but we didn’t make the time to develop that capability, which is a bit sad.

Comparing that API with LaunchDarkly I see that they do support A/B trials but don’t have multivariate tests live yet, which suggests that they are still very limited in that space. I think there is room for some very simple home grown work in this area to pay off nicely for Symphony (the project codename the flags system was written for).

Should you run your own?

It would be very unusual to have PII or customer data in the flag configuration store; and you shouldn’t have access control lists in there either (in LP we did allow turning on code by group, which is somewhat similar). Point is, that the very worst thing that can happen if someone else controls your feature flags and is malicious is actually not very bad. So as far as aaS vendor trust goes, not a lot of trust is needed to be pretty comfortable using one.

But, if you’re in a particularly high-trust environment, or you have no internet access, running your own may be super important, and then yeah, do it :). They aren’t big complex systems, even with multi-arm bandit logic added in (the difficulty there is the logic, not the processing).

Or if you think the prices being charged by the incumbents are ridiculous. Actually, perhaps hit me up and we’ll make a startup and do this right…

Should you build your own?

A trivial flag system + persistence could be as little as a few days work. Less if you grab an existing bolt-on for your framework. If you have multiple services, or teams, or languages.. expect that to become the gift that keeps on giving as you have to consolidate and converge across your organisation indefinitely. If you have the resources – great, not a problem.


I think most people will be better off taking one of the existing open source flag systems – perhaps https://unleash.github.io/ – and using it; even if it is more complex than a system tightly fitted to your needs, the benefit of having one that is a true API from the start will pay for itself the very first time you split a project, or want to report what features are on in dev and off in prod, not to mention multiple existing language bindings etc.

,

Robert CollinsChristchurch

Trigger warning: this is likely going to make you angry, or sad, or sadmad, or more. OTOH I’m going to make the points I’ve got to make in a short and pointed fashion and you, dear reader, can go look up supporting or refuting documentation as you feel appropriate.

Christchurch recently suffered NZ’s largest modern mass murder. I say modern because and also.

This is the third time I’ve been “close” to murder. Once was the Aramoana massacre, where I lived roughly halfway between the city and Aramoana itself – the helicopter flight path was right overhead; once was moving to Sydney and on the very first day after I arrived taking the train into the city from a train station bathed in blood. (With a helpful sign informing us that the police were seeking information that could lead to an arrest). And last Friday was Christchurch. And this is why I have put “close” in quotes: close enough to be viscerally affected, but not directly affected: none of the folk that died were personally known to me in any of these cases. I have had people I know die and thats a whole different level. There are plenty of people writing from that perspective, and you should go and listen to them. I have, and am, and have friends in that set, who need all the support they can get at the moment.

And that is why I’m a little conflicted about writing this post. Because this tragedy is all the more tragic because of our failures leading up to it.

What failures you might ask? Ours as citizens, or our bureaucracies? Or governments?

Lets look at this from a “Law & Order” perspective – means, motive, opportunity. The opportunity is something I think we should be proud of: folk being able to do what they want to do, inviting people they don’t know into their mosque, without fear. But means and motive…

Means

Means first. I wish I could say “I’m not going to engage in the nonsense debates about whether the sort of weapon the shooter had should be available for private citizen use or not.” But actually, I have to. Australia has ongoing problems with violence, and regulated guns heavily back in 1996. Since then, massacres have stopped being a thing in Australia. Prior to that, massacres were happening on a regular basis. This doesn’t deny the truth that the same machines can be used to shoot rabbits, but the evidence seems to be that they are too easily abused and there are many less over the top alternatives for the farm use case to be a compelling tradeoff.

Would the attacker have been able to carry out the attack without semi-automatic rifles? Sure. But with less ammunition per clip, less accuracy per shot, less damage per shot.

Here’s the sad bit though. We tightened our gun laws after Aramoana in 1990. We didn’t tighten them again after Port Arthur. We didn’t tighten them again after Sandy Hook. Or any of the other mass murders overseas using military weapons. We have had decades of inaction while the evidence of the potential harms increased. Late last year Stuff published this article – which is actually good reporting! I quote:

It is a very sad fact that changes to gun regulation only come about in the wake of a tragedy: Aramoana, Port Arthur, the Dunblane massacre.

Since 1992, politicians have backed nervously away.

50 people died on Friday in large part because the murderer had access to effective means for mass murder, which we knewwas effective, which our Police knew was effective, and which we spent political capital on other complete BS (Look up political scandals in NZ – I don’t have the heart to enumerate them right now).

Jacinda’s leadership right now consists of executing on a thing we’ve had queued up as necessary for – at the least – decades. It is an indictment on all of us here in NZ that 50 people had to die to get this done. I’m very sadmad right now. I don’t feel like I could have driven the debate in the right direction, but our leaders surely could have. And none of them are owning this.

Motive

It’s tempting to write the murderer off as being a bad human, or badly raised, or insane. But the reality as I understand it is that there were many signs of his intolerant violent views, that he doesn’t have a obvious mental disorder, and his violence was targeted: he’s not just plain evil and out to kill everyone…

In NZ at least we consider both intolerance and violence antisocial: we expect tolerance and peaceful discussion with each other as baseline characteristics of human beings.

And again, this is a thing we’ve seen before overseas. We’ve seen many murders done on the basis of violent intolerant ideology, but we haven’t actually adapted here to deal with it.

How are we, at a systemic level, engaging with the problem and addressing it. How do we help folk become tolerant? How do we give them other tools than violence? And if they are truely unable to use other tools, and unable to become tolerant, how do we safeguard ourselves?

But its worse: listening to my Māori compatriots NZ is pretty racist, *at a minimum* structurally, with many more Māori in prison, and presumably wage inequality and other “invisible” discriminations. Not much point asking everyone to be super tolerant and friendly with each other if we’re not giving each other a fair go.

We do pretty great in how we bring up kids in early childhood and primary school, from what I can tell with my daughter’s school, but I have no idea about higher schooling these days, and what do we do for adults and immigrants? Science says that most immigrants are motivated flexible people (self selecting group), but for the tiny tiny number that aren’t: how do we help them?

Personally I think wealth inequality goes a long way towards sustaining discrimination – anyone thinking of the world as a zero-sum game is much more likely to be hellbent on keeping other folk down to keep themselves up – and I very much want to see that change in NZ: I’d like to see us introduce a UBI, get rid of the means testing on various social supports (like the dole), and generally toss away the neoliberal narrative that has poisoned us for 30 odd years.

Glen TurnerJupyter notebook and R

This has become substantially simpler in Fedora 29:

sudo dnf install notebook R-IRKernel R-IRdisplay


comment count unavailable comments

,

Glen TurnerRipping language-learning CDs

It might be tempting to use MP3's variable bit rate for encoding ripped foreign languages CDs. With the large periods of silence that would seem to make a lot of sense. But you lose the ability to rewind to an exact millisecond, which turns out to be essential as you want to hear a particular phrase a handful of times. So use CBR -- constant bit rate -- encoding, and at a high bit rate like 160kbps.



comment count unavailable comments

,

Glen Turnerwpa_supplicant update trades off interoperation for security

In case you want to choose a different security compromise, the update has a nice summary:

wpasupplicant (2:2.6-19) unstable; urgency=medium

  With this release, wpasupplicant no longer respects the system
  default minimum TLS version, defaulting to TLSv1.0, not TLSv1.2. If
  you're sure you will never connect to EAP networks requiring anything less
  than 1.2, add this to your wpasupplicant configuration:

    tls_disable_tlsv1_0=1
    tls_disable_tlsv1_1=1

  wpasupplicant also defaults to a security level 1, instead of the system
  default 2. Should you need to change that, change this setting in your
  wpasupplicant configuration:

    openssl_ciphers=DEFAULT@SECLEVEL=2

  Unlike wpasupplicant, hostapd still respects system defaults.

 -- Andrej Shadura <…@debian.org>  Sat, 15 Dec 2018 14:22:18 +0100


comment count unavailable comments

,

Glen TurnerFinding git credentials in libsecret

To find passwords in libsecret you need to know what attributes to search for. These are often set by some shim but not documented. The attributes tend to vary by shim.

For git's libsecret shim the attributes are: protocol, server, user.

A worked example, the account gdt on git.example.org:

$ secret-tool search --all 'protocol' 'https' 'server' 'git.example.org' 'user' 'gdt'
[/org/freedesktop/secrets/collection/login/123]
label = Git: https://git.example.org/
secret = CvKxlezMsSDuR7piMBTzREJ7l8WL1T
created = 2019-02-01 10:20:34
modified = 2019-02-01 10:20:34
schema = org.gnome.keyring.NetworkPassword
attribute.protocol = https
attribute.server = git.example.org
attribute.user = gdt

Note that the "label" is mere documentation, it's the "attribute" entries which matter.



comment count unavailable comments

,

sthbrx - a POWER technical blogWhat Do You Mean "No"?

Quite often when building small Linux images having separate user accounts isn't always at the top of the list of things to include. Petitboot is no different; the most common operations like mounting disks, configuring interfaces, and calling kexec all require root and Petitboot generally only exists long enough to boot into the next thing, so why not run it all as root?

The picture is less clear when we start to think about what is possible to do in Petitboot by default. If someone comes across an open Petitboot console they're only a few keystrokes away from wiping disks, changing files, or even flashing firmware. Depending on how your system is used that may or may not be something you care about, but over time there have been a few requests to "add a password screen to Petitboot" to at least make it so that the system isn't open season for whoever sees it.

Enter Password:

The most direct way to avoid this would be to slap a password prompt onto Petitboot before any changes can be made. There are two immediate drawbacks to this:

  • The Petitboot UI still runs as root, and
  • Exiting to the shell gives the user root permissions as well.

There is already a mechanism to prevent the user exiting to the shell, but this puts all of our eggs in the basket of petitboot-nc being a secure program. If a user can accidentally or otherwise find a way to exit or crash the UI then they're immediately in a root shell, and while petitboot-nc is a good UI it was never designed to be a hardened program protecting the system.

You Have No Power Here

The idea instead as of Petitboot v1.10.0 is not to care if the user drops to the shell; because now it's completely unprivileged.

Normal shell

The only process now that runs as root is pb-discover itself; the console, UI, and helper scripts run as a new 'petituser'. For the server and clients to still communicate the "petitiboot.ui" socket permissions are modified to allow processes that are part of the 'petitgroup' to connect. However now if pb-discover notices that a client in the petitgroup is connecting (or more accurately the client isn't running as root) by default it ignores any commands from it that would configure or boot the system.

A new command, PB_PROTOCOL_ACTION_AUTHENTICATE, lets a client send a password to the server to then be allowed to send all the usual commands like updating the config or booting a specific option. This keeps all the authentication on the server side, avoiding writing any "secure" ncurses code. In the UI the biggest difference is that when trying to change something the user will hit a password field:

Denied

Then the password is sent to the server, checked, and if correct the action goes ahead.

Whose Passwords?

But where does this password come from? Technically it's just the root password. The server computes a hash of the supplied password and compares it against the system's root password. Similarly in the shell the user can run sudo with the root password to enter a full shell if needed:

Oops

Petitboot of course runs in memory, and writing a root password into the image itself would mean recompiling to change the password, so instead Petitboot pulls the root password from NVRAM. On startup Petitboot reads the petitboot,password parameter which is the hash of the root password and updates /etc/shadow with it. This happens before any clients are up or can connect to the server.

Don't Panic

By default no password is set. After all we don't want people upgrading and then being somehow locked out of their system. For ease of use, and for testing-purposes, if no password is configured and the user drops to the shell it is automatically upgraded to a root shell:

Elevated

To set a password there is a new subscreen in System Configuration:

New Password

This sends an authentication command to the server, and assuming the client is authenticated with the current password as well pb-discover updates the shadow file, and writes the hash back to NVRAM.


User support exists in Petitboot v1.10.0 onwards. You will also need some support in your build system to set up the users, see how op-build did it for an example.

There are a few items on the TODO list still which would be good to have. For example storing the password hash in an attached TPM if available, as well as splitting out more of what runs as root; for example the bootloader parsers in pb-discover preferably wouldn't run with root privileges, but they are all part of the one binary.

As always, comments, suggestions, and patches welcome on the list!

,

,

Tim SerongDistributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus

On January 21, 2019 I presented Distributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus at the linux.conf.au 2019 Systems Administration Miniconf. Thanks to the incredible Next Day Video crew, the video was online the next day, and you can watch it here:

If you’d rather read than watch, the meat of the talk follows, but before we get to that I have two important announcements:

  1. Cephalocon 2019 is coming up on May 19-20, in Barcelona, Spain. The CFP is open until Friday February 1, so time is rapidly running out for submissions. Get onto it.
  2. If you’re able to make it to FOSDEM on February 2-3, there’s a whole Software Defined Storage Developer Room thing going on, with loads of excellent content including What’s new in Ceph Nautilus – project status update and preview of the coming release and Managing and Monitoring Ceph with the Ceph Manager Dashboard, which will cover rather more than I was able to here.

Back to the talk. At linux.conf.au 2018, Sage Weil presented “Making distributed storage easy: usability in Ceph Luminous and beyond”. What follows is somewhat of a sequel to that talk, covering the changes we’ve made in the meantime, and what’s still coming down the track. If you’re not familiar with Ceph, you should probably check out A Gentle Introduction to Ceph before proceeding. In brief though, Ceph provides object, block and file storage in a single, horizontally scalable cluster, with no single points of failure. It’s Free and Open Source software, it runs on commodity hardware, and it tries to be self-managing wherever possible, so it notices when disks fail, and replicates data elsewhere. It does background scrubbing, and it tries to balance data evenly across the cluster. But you do still need to actually administer it.

This leads to one of the first points Sage made this time last year: Ceph is Hard. Status display and logs were traditionally difficult to parse visually, there were (and still are) lots of configuration options, tricky authentication setup, and it was difficult to figure out the number of placement groups to use (which is really an internal detail of how Ceph shards data across the cluster, and ideally nobody should need to worry about it). Also, you had to do everything with a CLI, unless you had a third-party GUI.

I’d like to be able to flip this point to the past tense, because a bunch of those things were already fixed in the Luminous release in August 2017; status display and logs were cleaned up, a balancer module was added to help ensure data is spread more evenly, crush device classes were added to differentiate between HDDs and SSDs, a new in-tree web dashboard was added (although it was read-only, so just cluster status display, no admin tasks), plus a bunch of other stuff.

But we can’t go all the way to saying “Ceph was hard”, because that might imply that everything is now easy. So until we reach that frabjous day, I’m just going to say that Ceph is easier now, and it will continue to get easier in future.

At linux.conf.au in January 2018, we were half way through the Mimic development cycle, and at the time the major usability enhancements planned included:

  • Centralised configuration management
  • Slick deployment in Kubernetes with Rook
  • A vastly improved dashboard based on ceph-mgr and openATTIC
  • Placement Group merging

We got some of that stuff done for Mimic, which was released in June 2018, and more of it is coming in the Nautilus release, which is due out very soon.

In terms of usability improvements, Mimic gave us a new dashboard, inspired by and derived from openATTIC. This dashboard includes all the features of the Luminous dashboard, plus username/password authentication, SSL/TLS support, RBD and RGW management, and a configuration settings browser. Mimic also brought the ability to store and manage configuration options centrally on the MONs, which means we no longer need to set options in /etc/ceph/ceph.conf, replicate that across the cluster, and restart whatever daemons were affected. Instead, you can run `ceph config set ...` to make configuration changes. For initial cluster bootstrap, you can even use DNS SRV records rather than specifying MON hosts in the ceph.conf file.

As I mentioned, the Nautilus release is due out really soon, and will include a bunch more good stuff:

  • PG autoscaling:
  • More dashboard enhancements, including:
    • Multiple users/roles, also single sign on via SAML
    • Internationalisation and localisation
    • iSCSI and NFS Ganesha management
    • Embedded Grafana dashboards
    • The ability to mark OSDs up/down/in/out, and trigger scrubs/deep scrubs
    • Storage pool management
    • A configuration settings editor which actually tells you what the configuration settings mean, and do
    • Embedded Grafana dashboards
    • To see what this all looks like, check out Ceph Manager Dashboard Screenshots as of 2019-01-17
  • Blinky lights, that being the ability to turn on or off the ident and fault LEDs for the disk(s) backing a given OSD, so you can find the damn things in your DC.
  • Orchestrator module(s)

Blinky lights, and some of the dashboard functionality (notably configuring iSCSI gateways and NFS Ganesha) means that Ceph needs to be able to talk to whatever tool it was that deployed the cluster, which leads to the final big thing I want to talk about for the Nautilus release, which is the Orchestrator modules.

There’s a bunch of ways to deploy Ceph, and your deployment tool will always know more about your environment, and have more power to do things than Ceph itself will, but if you’re managing Ceph, through the inbuilt dashboard and CLI tools, there’s things you want to be able to do as a Ceph admin, that Ceph itself can’t do. Ceph can’t deploy a new MDS, or RGW, or NFS Ganesha host. Ceph can’t deploy new OSDs by itself. Ceph can’t blink the lights on a disk on some host if Ceph itself has somehow failed, but the host is still up. For these things, you rely on your deployment tool, whatever it is. So Nautilus will include Orchestrator modules for Ansible, DeepSea/Salt, and Rook/Kubernetes, which allow the Ceph management tools to call out to your deployment tool as necessary to have it perform those tasks. This is the bit I’m working on at the moment.

Beyond Nautilus, Octopus is the next release, due in a bit more than nine months, and on the usability front I know we can expect more dashboard and more orchestrator functionality, but before that, we have the Software Defined Storage Developer Room at FOSDEM on February 2-3 and Cephalocon 2019 on May 19-20. Hopefully some of you reading this will be able to attend 🙂

Update 2019-02-04: Check out Sage’s What’s new in Ceph Nautilus FOSDEM talk for much more detail on what’s coming up in Nautilus and beyond.

,

Glen Turnerudev rules

It's pretty common to add a udev rule in /etc/udev/rules.d/ for new hardware.

There are two ways of granting access, using groups and permissions, and using systemd's uaccess tag.

Here's an example showing groups and permissions. Anyone in the "users" group can access the switch, but only those in the "eng" (engineering) group can flash the switch. This is a pretty common arrangement for hardware development teams:

# /etc/udev/rules.d/77-northbound-networks.rules
# Northbound Networks
#  Zodiac FX OpenFlow switch
ATTRS{idVendor}=="03eb", ATTRS{idProduct}=="2404", ENV{ID_MM_DEVICE_IGNORE}="1", GROUP="users", MODE="0660", SYMLINK+="ttyzodiacfx"
#  Zodiac FX OpenFlow switch after flash "erase"
#  The Atmel SAM4E Cortex-M4F CPU is running a bootloader waiting for software
#  download via USB and the SAM-BA tool (the CPU is Atmel part ATSAM4E8C-AU,
#  use board description "at91sam4e8-ek").
ATTRS{idVendor}=="03eb", ATTRS{idProduct}=="6124", ENV{ID_MM_DEVICE_IGNORE}="1", GROUP="eng", MODE="0660", SYMLINK+="ttyat91sam4e8-ek"
# Atmel-ICE Basic JTAG
ATTRS{idVendor}=="03eb", ATTRS{idProduct}=="2141", MODE="664", GROUP="eng"

Here's an example for a Yubikey, Any seated user can access the Ybukey:

# /etc/udev/rules.d/69-u2f.rules
# Yubico YubiKey
KERNEL=="hidraw*", SUBSYSTEM=="hidraw", ATTRS{idVendor}=="1050", ATTRS{idProduct}=="0113|0114|0115|0116|0120|0200|0402|0403|0406|0407|0410", TAG+="uaccess"

Note that this file must run before /usr/lib/udev/rules.d/73-seat-late.rules.

Also, the systemd developers have tried to abstract the rules a little, making them more declarative and less procedural (alawys a good design rule). Of course, they haven't documented this (never a good design practice). See the file /lib/udev/rules.d/70-uaccess.rules and look for the ID_ variables. So the Yubikey example could have been:

# /etc/udev/rules.d/69-u2f.rules
# Yubico YubiKey
KERNEL=="hidraw*", SUBSYSTEM=="hidraw", ATTRS{idVendor}=="1050", ATTRS{idProduct}=="0113|0114|0115|0116|0120|0200|0402|0403|0406|0407|0410", ENV{ID_SECURITY_TOKEN}="1"

If you want only seated users accessing the device then use the uaccess tag. If you want users remotely accessing the machine to use the device, then you use a group and permissions.



comment count unavailable comments

,

Matt Palmerpwnedkeys: who has the keys to *your* kingdom?

pwnedkeys.com logo

I am extremely pleased to announce the public release of pwnedkeys.com – a database of compromised asymmetric encryption keys. I hope this will become the go-to resource for anyone interested in avoiding the re-use of known-insecure keys. If you have a need, or a desire, to check whether a key you’re using, or being asked to accept, is potentially in the hands of an adversary, I would encourage you to take a look.

Pwnage... EVERYWHERE

By now, most people in the IT industry are aware of the potential weaknesses of passwords, especially short or re-used passwords. Using a password which is too short (or, more technically, with “insufficient entropy”) leaves us open to brute force attacks, while re-using the same password on multiple sites invites a credential stuffing attack.

It is rare, however, that anyone thinks about the “quality” of RSA or ECC keys that we use with the same degree of caution. There are so many possible keys, all of which are “high quality” (and thus not subject to “brute force”), that we don’t imagine that anyone could ever compromise a private key except by actually taking a copy of it off our hard drives.

There is a unique risk with the use of asymmetric cryptography, though. Every time you want someone to encrypt something to you, or verify a signature you’ve created, you need to tell them your public key. While someone can’t calculate your private key from your public key, the public key does have enough information in it to be able to identify your private key, if someone ever comes across it.

So what?

smashed window

The risk here is that, in many cases, a public key truly is public. Every time your browser connects to a HTTPS-protected website, the web server sends a copy of the site’s public key (embedded in the SSL certificate). Similarly, when you connect to an SSH server, you get the server’s public key as part of the connection process. Some services provide a way for anyone to query a user’s public keys.

Once someone has your public key, it can act like an “index” into a database of private keys that they might already have. This is only a problem, of course, if someone happens to have your private key in their stash. The bad news is that there are a lot of private keys already out there, that have either been compromised by various means (accident or malice), or perhaps generated by a weak RNG.

When you’re generating keys, you usually don’t have to worry. The chances of accidentally generating a key that someone else already has is as close to zero as makes no difference. Where you need to be worried is when you’re accepting public keys from other people. Unlike a “weak” password, you can’t tell a known-compromised key just by looking at it. Even if you saw the private key, it would look just as secure as any other key. You cannot know whether a public key you’re being asked to accept is associated with a known-compromised private key. Or you couldn’t, until pwnedkeys.com came along.

The solution!

The purpose of pwnedkeys.com is to try and collect every private key that’s ever gotten “out there” into the public, and warn people off using them ever again. Don’t think that people don’t re-use these compromised keys, either. One of the “Debian weak keys” was used in an SSL certificate that was issued in 2016, some eight years after the vulnerability was made public!

My hope is that pwnedkeys.com will come to be seen as a worthwhile resource for anyone who accepts public keys, and wants to know that they’re not signing themselves up for a security breach in the future.

,

Matt PalmerFalsehoods Programmers Believe About Pagination

The world needs it, so I may as well write it.

  • The number of items on a page is fixed for all time.
  • The number of items on a page is fixed for one user.
  • The number of items on a page is fixed for one result set.
  • The pages are only browsed in one direction.
  • No item will be added to the result set during retrieval.
  • No item will be removed from the result set during retrieval.
  • Item sort order is stable.
  • Only one page of results will be retrieved at one time.
  • Pages will be retrieved in order.
  • Pages will be retrieved in a timely manner.
  • No problem will result from two different users seeing different pagination of the same items at about the same time. (From @ronburk)

,

OpenSTEMSchool-wide Understanding Our World® implementations

Are you considering implementing our integrated HASS+Science program, but getting a tad confused by the pricing?  Our subscription model didn’t not provide a So nowstraightforward calculation for a whole school or year-level.  However, it generally works out to $4.40 (inc.GST) per student.  So now we’re providing this as an option directly: implement our integrated HASS+Science program […]

The post School-wide Understanding Our World® implementations first appeared on OpenSTEM Pty Ltd.

,

BlueHackersEntrepreneurs’ Mental Health and Well-being Survey

Jamie Pride has partnered with Swinburne University and Dr Bronwyn Eager to conduct the largest mental health and well-being survey of Australian entrepreneurs and founders. This survey will take approx 5 minutes to complete. Can you also please spread the word and share this via your networks!

Getting current and relevant Australian data is extremely important! The findings of this study will contribute to the literature on mental health and well-being in entrepreneurs, and that this will potentially lead to future improvements in the prevention and treatment of psychological distress.

Jamie is extremely passionate about this cause! Your help is greatly appreciated.

The post Entrepreneurs’ Mental Health and Well-being Survey first appeared on BlueHackers.org.

,

Peter LieverdinkDark Doodad

It's been a while since I did a blog, so after twiddling the way the front page of the site displays, it's time to post a new one. The attached photo is of my favourite dark nebula, "The Dark Doodad". What looks like a long thin nebula is apparently a sheet over 40 light years wide that we happen to be seeing edge-on. On the left you can see a few dark tendrils that are par of the coal sack nebula. The Dark Doodad This is one of the first images created from a stack of subs I took using AstroDSLR. Each exposure is 2 minutes and I stacked 20 of them. My polar alignment was pretty decent, I think!

,

Glen TurnerScreenshot web pages

  • In a recent version of Firefox, open the developer toolbar (shift + f2).
  • Enter screenshot filename.png -fullpage
  • Filename.png will be saved to your downloads folder.


comment count unavailable comments

,

OpenSTEMChildren in Singapore will no longer be ranked by exam results. Here’s why | World Economic Forum

https://www.weforum.org/agenda/2018/10/singapore-has-abolished-school-exam-rankings-here-s-why The island nation is changing its educational focus to encourage school children to develop the life skills they will need when they enter the world of work.

The post Children in Singapore will no longer be ranked by exam results. Here’s why | World Economic Forum first appeared on OpenSTEM Pty Ltd.

,

OpenSTEMHelping Migrants to Australia

The end of the school year is fast approaching with the third term either over or about to end and the start of the fourth term looming ahead. There never seems to be enough time in the last term with making sure students have met all their learning outcomes for the year and with final […]

The post Helping Migrants to Australia first appeared on OpenSTEM Pty Ltd.

,

OpenSTEMOur interwoven ancestry

In 2008 a new group of human ancestors – the Denisovans, were defined on the basis of a single finger knuckle (phalanx) bone discovered in Denisova cave in the Altai mountains of Siberia. A molar tooth, found at Denisova cave earlier (in 2000) was determined to be of the same group. Since then extensive work […]

The post Our interwoven ancestry first appeared on OpenSTEM Pty Ltd.

,

Clinton RoyMoving to Melbourne

Now that the paperwork has finally all been dealt with, I can announce that I’ll be moving down to Melbourne to take up a position with the Australian Synchrotron, basically a super duper x-ray machine used for research of all types. My official position is a >in< Senior Scientific Software Engineer <out> I’ll be moving down to Melbourne shortly, staying with friends (you remember that offer you made, months ago?) until I find a rental near Monash Uni, Clayton.

I will be leaving behind Humbug, the computer group that basically opened up my entire career, and The Edge, SLQ, my home-away-from-home study. I do hope to be able to find replacements for these down south.

I’m looking at having a small farewell nearby soon.

A shout out to Netbox Blue for supplying all my packing boxes. Allll of them.

OpenSTEMThis Week in Australian History

The end of August and beginning of September is traditionally linked to the beginning of Spring in Australia, although the change in seasons is experienced in different ways in different parts of the country and was marked in locally appropriate ways by Aboriginal people. As a uniquely Australian celebration of Spring, National Wattle Day, celebrated […]

The post This Week in Australian History first appeared on OpenSTEM Pty Ltd.

,

BlueHackersVale Janet Hawtin Reid

Janet Hawtin ReidJanet Hawtin Reid (@lucychili) sadly passed away last week.

A mutual friend called me a earlier in the week to tell me, for which I’m very grateful.  We both appreciate that BlueHackers doesn’t ever want to be a news channel, so I waited writing about it here until other friends, just like me, would have also had a chance to hear via more direct and personal channels. I think that’s the way these things should flow.

knitted Moomin troll by Janet Hawtin ReidI knew Janet as a thoughtful person, with strong opinions particularly on openness and inclusion.  And as an artist and generally creative individual,  a lover of nature.  In recent years I’ve also seen her produce the most awesome knitted Moomins.

Short diversion as I have an extra connection with the Moomin stories by Tove Jansson: they have a character called My, after whom Monty Widenius’ eldest daughter is named, which in turn is how MySQL got named.  I used to work for MySQL AB, and I’ve known that My since she was a little smurf (she’s an adult now).

I’m not sure exactly when I met Janet, but it must have been around 2004 when I first visited Adelaide for Linux.conf.au.  It was then also that Open Source Industry Australia (OSIA) was founded, for which Janet designed the logo.  She may well have been present at the founding meeting in Adelaide’s CBD, too.  OSIA logo - by Janet Hawtin ReidAnyhow, Janet offered to do the logo in a conversation with David Lloyd, and things progressed from there. On the OSIA logo design, Janet wrote:

I’ve used a star as the current one does [an earlier doodle incorporated the Southern Cross]. The 7 points for 7 states [counting NT as a state]. The feet are half facing in for collaboration and half facing out for being expansive and progressive.

You may not have realised this as the feet are quite stylised, but you’ll definitely have noticed the pattern-of-7, and the logo as a whole works really well. It’s a good looking and distinctive logo that has lasted almost a decade and a half now.

Linux Australia logo - by Janet Hawtin ReidAs Linux Australia’s president Kathy Reid wrote, Janet also helped design the ‘penguin feet’ logo that you see on Linux.org.au.  Just reading the above (which I just retrieved from a 2004 email thread) there does seem to be a bit of a feet-pattern there… of course the explicit penguin feet belong with the Linux penguin.

So, Linux Australia and OSIA actually share aspects of their identity (feet with a purpose), through their respective logo designs by Janet!  Mind you, I only realised all this when looking through old stuff while writing this post, as the logos were done at different times and only a handful of people have ever read the rationale behind the OSIA logo until now.  I think it’s cool, and a fabulous visual legacy.

Fir tree in clay, by Janet Hawtin Reid
Fir tree in clay, by Janet Hawtin Reid. Done in “EcoClay”, brought back to Adelaide from OSDC 2010 (Melbourne) by Kim Hawtin, Janet’s partner.

Which brings me to a related issue that’s close to my heart, and I’ve written and spoken about this before.  We’re losing too many people in our community – where, in case you were wondering, too many is defined as >0.  Just like in a conversation on the road toll, any number greater than zero has to be regarded as unacceptable. Zero must be the target, as every individual life is important.

There are many possible analogies with trees as depicted in the above artwork, including the fact that we’re all best enabled to grow further.

Please connect with the people around you.  Remember that connecting does not necessarily mean talking per-se, as sometimes people just need to not talk, too.  Connecting, just like the phrase “I see you” from Avatar, is about being thoughtful and aware of other people.  It can just be a simple hello passing by (I say hi to “strangers” on my walks), a short email or phone call, a hug, or even just quietly being present in the same room.

We all know that you can just be in the same room as someone, without explicitly interacting, and yet feel either connected or disconnected.  That’s what I’m talking about.  Aim to be connected, in that real, non-electronic, meaning of the word.

If you or someone you know needs help or talk right now, please call 1300 659 467 (in Australia – they can call you back, and you can also use the service online).  There are many more resources and links on the BlueHackers.org website.  Take care.

The post Vale Janet Hawtin Reid first appeared on BlueHackers.org.

,

Ian WienandLocal qemu/kvm virtual machines, 2018

For work I run a personal and a work VM on my laptop. When I was at VMware I dogfooded internal builds of Workstation which worked well, but was always a challenge to have its additions consistently building against latest kernels. About 5 and half years ago, the only practical alternative option was VirtualBox. IIRC SPICE maybe didn't even exist or was very early, and while VNC is OK to fiddle with something, completely impractical for primary daily use.

VirtualBox is fine, but there is the promised land of all the great features of qemu/kvm and many recent improvements in 3D integration always calling. I'm trying all this on my Fedora 28 host, with a Fedora 28 guest (which has been in-place upgraded since Fedora 19), so everything is pretty recent. Periodically I try this conversion again, but, spoiler alert, have not yet managed to get things quite right.

As I happened to close an IRC window, somehow my client seemed to crash X11. How odd ... so I thought, everything has just disappeared anyway; I might as well try switching again.

Image conversion has become much easier. My primary VM has a number of snapshots, so I used the VirtualBox GUI to clone the VM and followed the prompts to create the clone with squashed snapshots. Then simply convert the VDI to a RAW image with

$ qemu-img convert -p -f vdi -O raw image.vdi image.raw

Note if you forget the progress meter, send the pid a SIGUSR1 to get it to spit out a progress.

virt-manager has come a long way too. Creating a new VM was trivial. I wanted to make sure I was using all the latest SPICE gl etc., stuff. Here I hit some problems with what seemed to be permission denials on drm devices before even getting the machine started. Something suggested using libvirt in session mode, with the qemu:///session URL -- which seemed more like what I want anyway (a VM for only my user). I tried that, put the converted raw image in my home directory and the VM would boot. Yay!

It was a bit much to expect it to work straight away; while GRUB did start, it couldn't find the root disks. In hindsight, you should probably generate a non-host specific initramfs before converting the disk, so that it has a larger selection of drivers to find the boot devices (especially the modern virtio drivers). On Fedora that would be something like

sudo dracut --no-hostonly --regenerate-all -f

As it turned out, I "simply" attached a live-cd and booted into that, then chrooted into my old VM and regenerated the initramfs for the latest kernel manually. After this the system could find the LVM volumes in the image and would boot.

After a fiddly start, I was hopeful. The guest kernel dmesg DRM sections showed everything was looking good for 3D support, along with the glxinfo showing all the virtio-gpu stuff looking correct. However, I could not get what I hoped was trivial automatic window resizing happening no matter what. After a bunch of searching, ensuring my agents were running correctly, etc. it turns out that has to be implemented by the window-manager now, and it is not supported by my preferred XFCE (see https://bugzilla.redhat.com/show_bug.cgi?id=1290586). Note you can do this manually with xrandr --output Virtual-1 --auto to get it to resize, but that's rather annoying.

I thought that it is 2018 and I could live with Gnome, so installed that. Then I tried to ping something, and got another selinux denial (on the host) from qemu-system-x86 creating icmp_socket. I am guessing this has to do with the interaction between libvirt session mode and the usermode networking device (filed https://bugzilla.redhat.com/show_bug.cgi?id=1609142). I figured I'd limp along with ICMP and look into details later...

Finally when I moved the window to my portrait-mode external monitor, the SPICE window expanded but the internal VM resolution would not expand to the full height. It looked like it was taking the height from the portrait-orientation width.

Unfortunately, forced swapping of environments and still having two/three non-trivial bugs to investigate exceeded my practical time to fiddle around with all this. I'll stick with VirtualBox for a little longer; 2020 might be the year!

,

Andrew Ruthvenlinux.conf.au 2019 - Call for Proposals

At the start of July, the LCA2019 team announced that the Call for Proposals for linux.conf.au 2019 were open! This Call for Proposals will close on July 30. If you want to submit a proposal, you don't have much time!

linux.conf.au is one of the best-known community driven Free and Open Source Software conferences in the world. In 2019 we welcome you to join us in Christchurch, New Zealand on Monday 21 January through to Friday 25 January.

For full details including those not covered by this announcement visit https://linux.conf.au/call-for-papers/, and the full announcement is here.

IMPORTANT DATES

  • Call for Proposals Opens: 2 July 2018
  • Call for Proposals Closes: 30 July 2018 (no extensions)
  • Notifications from the programme committee: early-September 2018
  • Conference Opens: 21st January 2019

,

Ian Wienanduwsgi; oh my!

The world of Python based web applications, WSGI, its interaction with uwsgi and various deployment methods can quickly turn into a incredible array of confusingly named acronym soup. If you jump straight into the uwsgi documentation it is almost certain you will get lost before you start!

Below tries to lay out a primer for the foundations of application deployment within devstack; a tool for creating a self-contained OpenStack environment for testing and interactive development. However, it is hopefully of more general interest for those new to some of these concepts too.

WSGI

Let's start with WSGI. Fully described in PEP 333 -- Python Web Server Gateway Interface the core concept a standardised way for a Python program to be called in response to a web request. In essence, it bundles the parameters from the incoming request into known objects, and gives you can object to put data into that will get back to the requesting client. The "simplest application", taken from the PEP directly below, highlights this perfectly:

def simple_app(environ, start_response):
     """Simplest possible application object"""
     status = '200 OK'
     response_headers = [('Content-type', 'text/plain')]
     start_response(status, response_headers)
     return ['Hello world!\n']

You can start building frameworks on top of this, but yet maintain broad interoperability as you build your application. There is plenty more to it, but that's all you need to follow for now.

Using WSGI

Your WSGI based application needs to get a request from somewhere. We'll refer to the diagram below for discussions of how WSGI based applications can be deployed.

Overview of some WSGI deployment methods

In general, this is illustrating how an API end-point http://service.com/api/ might be connected together to an underlying WSGI implementation written in Python (web_app.py). Of course, there are going to be layers and frameworks and libraries and heavens knows what else in any real deployment. We're just concentrating on Apache integration -- the client request hits Apache first and then gets handled as described below.

CGI

Starting with 1 in the diagram above, we see CGI or "Common Gateway Interface". This is the oldest and most generic method of a web server calling an external application in response to an incoming request. The details of the request are put into environment variables and whatever process is configured to respond to that URL is fork() -ed. In essence, whatever comes back from stdout is sent back to the client and then the process is killed. The next request comes in and it starts all over again.

This can certainly be done with WSGI; above we illustrate that you'd have a framework layer that would translate the environment variables into the python environ object and connect up the processes output to gather the response.

The advantage of CGI is that it is the lowest common denominator of "call this when a request comes in". It works with anything you can exec, from shell scripts to compiled binaries. However, forking processes is expensive, and parsing the environment variables involves a lot of fiddly string processing. These become issues as you scale.

Modules

Illustrated by 2 above, it is possible to embed a Python interpreter directly into the web server and call the application from there. This is broadly how mod_python, mod_wsgi and mod_uwsgi all work.

The overheads of marshaling arguments into strings via environment variables, then unmarshaling them back to Python objects can be removed in this model. The web server handles the tricky parts of communicating with the remote client, and the module "just" needs to translate the internal structures of the request and response into the Python WSGI representation. The web server can manage the response handlers directly leading to further opportunities for performance optimisations (more persistent state, etc.).

The problem with this model is that your web server becomes part of your application. This may sound a bit silly -- of course if the web server doesn't take client requests nothing works. However, there are several situations where (as usual in computer science) a layer of abstraction can be of benefit. Being part of the web server means you have to write to its APIs and, in general, its view of the world. For example, mod_uwsgi documentation says

"This is the original module. It is solid, but incredibly ugly and does not follow a lot of apache coding convention style".

uwsgi

mod_python is deprecated with mod_wsgi as the replacement. These are obviously tied very closely to internal Apache concepts.

In production environments, you need things like load-balancing, high-availability and caching that all need to integrate into this model. Thus you will have to additionally ensure these various layers all integrate directly with your web server.

Since your application is the web server, any time you make small changes you essentially need to manage the whole web server; often with a complete restart. Devstack is a great example of this; where you have 5-6 different WSGI-based services running to simulate your OpenStack environment (compute service, network service, image service, block storage, etc) but you are only working on one component which you wish to iterate quickly on. Stopping everything to update one component can be tricky in both production and development.

uwsgi

Which brings us to uwsgi (I call this "micro-wsgi" but I don't know if it actually intended to be a μ). uwsgi is a real Swiss Army knife, and can be used in contexts that don't have to do with Python or WSGI -- which I believe is why you can get quite confused if you just start looking at it in isolation.

uwsgi lets us combine some of the advantages of being part of the web server with the advantages of abstraction. uwsgi is a complete pluggable network daemon framework, but we'll just discuss it in one context illustrated by 3.

In this model, the WSGI application runs separately to the webserver within the embedded python interpreter provided by the uwsgi daemon. uwsgi is, in parts, a web-server -- as illustrated it can talk HTTP directly if you want it to, which can be exposed directly or via a traditional proxy.

By using the proxy extension mod_proxy_uwsgi we can have the advantage of being "inside" Apache and forwarding the requests via a lightweight binary channel to the application back end. In this model, uwsgi provides a uwsgi:// service using its internal protcol on a private port. The proxy module marshals the request into small packets and forwards it to the given port. uswgi takes the incoming request, quickly unmarshals it and feeds it into the WSGI application running inside. Data is sent back via similarly fast channels as the response (note you can equally use file based Unix sockets for local only communication).

Now your application has a level of abstraction to your front end. At one extreme, you could swap out Apache for some other web server completely and feed in requests just the same. Or you can have Apache start to load-balance out requests to different backend handlers transparently.

The model works very well for multiple applications living in the same name-space. For example, in the Devstack context, it's easy with mod_proxy to have Apache doing URL matching and separate out each incoming request to its appropriate back end service; e.g.

  • http://service/identity gets routed to Keystone running at localhost:40000
  • http://service/compute gets sent to Nova at localhost:40001
  • http://service/image gets sent to glance at localhost:40002

and so on (you can see how this is exactly configured in lib/apache:write_uwsgi_config).

When a developer makes a change they simply need to restart one particular uwsgi instance with their change and the unified front-end remains untouched. In Devstack (as illustrated) the uwsgi processes are further wrapped into systemd services which facilitates easy life-cycle and log management. Of course you can imagine you start getting containers involved, then container orchestrators, then clouds-on-clouds ...

Conclusion

There's no right or wrong way to deploy complex web applications. But using an Apache front end, proxying requests via fast channels to isolated uwsgi processes running individual WSGI-based applications can provide both good performance and implementation flexibility.

,

Chris SamuelSubmission to Joint Select Committee on Constitutional Recognition Relating to Aboriginal and Torres Strait Islander Peoples

Tonight I took some time to send a submission in to the Joint Select Committee on Constitutional Recognition Relating to Aboriginal and Torres Strait Islander Peoples in support of the Uluru Statement from the Heart from the 2017 First Nations National Constitutional Convention held at Uluru. Submissions close June 11th so I wanted to get this in as I feel very strongly about this issue.

Here’s what I wrote:

To the Joint Select Committee on Constitutional Recognition Relating to Aboriginal and Torres Strait Islander Peoples,

The first peoples of Australia have lived as part of this continent for many times longer than the ancestors of James Cook lived in the UK(*), let alone this brief period of European colonisation called Australia.

They have farmed, shaped and cared for this land over the millennia, they have seen the climate change, the shorelines move and species evolve.

Yet after all this deep time as custodians of this land they were dispossessed via the convenient lie of Terra Nullius and through killing, forced relocation and introduced sickness had their links to this land severely beaten, though not fatally broken.

Yet we still have the chance to try and make a bridge and a new relationship with these first peoples; they have offered us the opportunity for a Makarrata and I ask you to grasp this opportunity with both hands, for the sake of all Australians.

Several of the component states and territories of this recent nation of Australia are starting to investigate treaties with their first peoples, but this must also happen at the federal level as well.

Please take the Uluru Statement from the Heart to your own hearts, accept the offering of Makarrata & a commission and let us all move forward together.

Thank you for your attention.

Your sincerely,
Christopher Samuel

(*) Australia has been continuously occupied for at least 50,000 years, almost certainly for at least 60,000 years and likely longer. The UK has only been continuously occupied for around the last 10,000 years after the last Ice Age drove its previous population out into warmer parts of what is now Europe.

Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

This item originally posted here:

Submission to Joint Select Committee on Constitutional Recognition Relating to Aboriginal and Torres Strait Islander Peoples

,

Jonathan AdamczewskiModern C++ Randomness

This thread happened…

So I did a little digging to satisfy my own curiosity about the “modern C++” version, and have learned a few things that I didn’t know previously…

(this is a manual unrolled twitter thread that starts here, with slight modifications)

Nearly all of this I gleaned from the invaluable and . Comments about implementation refer specifically to the gcc-8.1 C++ standard library, examined using Compiler Explorer and the -E command line option.

std::random_device is a platform-specific source of entropy.

std: mt19937 is a parameterized typedef of std::mersenne_twister_engine

specifically:
std::mersenne_twister_engine<uint_fast32_t, 32, 624, 397, 31, 0x9908b0df, 11, 0xffffffff, 7, 0x9d2c5680, 15, 0xefc60000, 18, 1812433253>
(What do those number mean? I don’t know.)

And std::uniform_int_distribution produces uniformly distributed random numbers over a specified range, from a provided generator.

The default constructor for std::random_device takes an implementation-defined argument, with a default value.

The meaning of the argument is implementation-defined – but the type is not: std::string. (I’m not sure why a dynamically modifiable string object was the right choice to be the configuration parameter for an entropy generator.)

There are out-of-line private functions for much of this implementation of std::random_device. The constructor that calls the out-of-line init function is itself inline – so the construction and destruction of the default std::string param is also generated inline.

Also, peeking inside std::random_generator, there is a union with two members:

void* _M_file, which I guess would be used to store a file handle for /dev/urandom or similar.

std::mt19937 _M_mt, which is a … parameterized std::mersenne_twister_engine object.

So it seems reasonable to me that if you can’t get entropy* from outside your program, generate your own approximation. It looks like it is possible that the entropy for the std::mersenne_twister_engine will be provided by a std::mersenne_twister_engine.

Unlike std::random_device, which has its implementation out of line, std::mersenne_twister_engine‘s implementation seems to be all inline. It is unclear what benefits this brings, but it results in a few hundred additional instructions generated.

And then there’s std::uniform_int_distribution, which seems mostly unsurprising. It is again fully inline, which (from a cursory eyeballing) may allow a sufficiently insightful compiler to avoid a couple of branches and function calls.

The code that got me started on this was presented in jest – but (std::random_device + std::mt19937 + std::uniform_int_distribution) is a commonly recommended pattern for generating random numbers using these modern C++ library features.

My takeaways:
std::random_device is potentially very expensive to use – and doesn’t provide strong cross-platform guarantees about the randomness it provides. It is configured with an std::string – the meaning of which is platform dependent. I am not compelled to use this type.

std::mt19937 adds a sizeable chunk of codegen via its inline implementation – and there are better options than Mersenne Twister.

Bottom line: I’m probably going to stick with rand(), and if I need something a little fancier,  or one of the other suggestions provided as replies to the twitter thread.

Addition: the code I was able to gather, representing some relevant parts

,

Clinton RoyActively looking for work

I am now actively looking for work, ideally something with Unix/C/Python in the research/open source/not-for-proft space. My long out of date resume has been updated.

,

Arjen LentzThe trouble with group labels. Because.

So Australia’s long-term accepted refugee detainees on Manus and Nauru will not be able to migrate to the US, if they come from a country that’s on Trump’s list.  So anyone from a particular set of countries is classified as bad.  Because Muslim.

paper boatFundamentally of course, these fellow humans should not be detained at all. They have been accepted as genuine refugees by the Australian immigration department.  Their only “crime”, which is not actually a crime by either Australian or International law, was to arrive by boat.  They are now, as a group, abused as deterrent marketing material.  Anyone coming by boat to Australia is classified as bad.  Because boat (although stats actually indicate that it might correlate better with skin colour).

My grandfather, a surgeon at a Berlin hospital, lost his job on the same day that Hitler was elected.  Since not even decrees move that quickly and the exclusion of jews from  particular professions happened gradually over the 1930s, we can only assume that the hospital management contained some overzealous individuals, who took initiatives that they thought would be looked on favourably by their new superiors. Because Jew.

The Berlin events are a neat example of how a hostile atmosphere enables certain behaviour – of course, if you’d asked the superiors, they didn’t order any such thing and it had nothing to do with them. Sound familiar?

Via jiggly paths, my family came to the UK.  Initially interned.  Because German.

My grandfather, whom as I mentioned was an accomplished surgeon, had to re-do all his medical education in Britain – during that period he was separated from his wife and two young daughters (Scotland vs the south of England – a long way in the 1930s).  Because [the qualifications and experience] not British.  Possibly also Because German.

Hassles with running a practice, and being allowed a car.  Because German.

Then allowed those things.  Because Useful.  He was still a German Jew though….

I mentioned this, because other refugees at the time would have had the same rules applied, but my grandfather had the advantage of his profession and thus being regarded as useful.  Others would not have had that benefit.  This means that people were being judged “worthy” merely based on their immediate usefulness to the local politics of the day, not for being a fellow human being in need, or any other such consideration.

Group Labels and Value Judgements

Any time we classify some group as more or less worthy than another group or set of groups, trouble will follow – both directly, and indirectly.  Every single time.  And the trouble will affect everybody, it’s not selective to some group(s).  Also every time.  Historically verifiable.

brown eyes - blue eyesPopulists simplify, for political gain.  Weak leaders pander to extreme elements, in the hope that it will keep them in power.

The method of defining groups doesn’t matter, nor does one need to add specific “instructions”. The “Blue Eye experiment” proved this. The nasties are initiated merely through a “simple” value judgement of one group vs another.  That’s all that’s required, and the bad consequences are all implied and in a way predetermined.  Trouble will follow, and it’s not going to end well.

Identity

It’s fine to identify as a member of a particular group, or more  likely multiple groups as many factors overlap.  That is part of our identity.  But that’s is not the same as passing a judgement on the relative value of one of those groups vs another group.

  • Brown vs blue eyes
  • Muslim vs Christian vs Jew
  • White vs black “race”
  • One football club vs another

human skullIt really doesn’t matter.  Many of these groupings are entirely arbitrary.  The concept of “race” has no scientific basis, we humans are verifiably a single species.

You can make up any arbitrary classification.  It won’t make a difference.  It doesn’t matter.  The outcomes will be the same.

Given what we know about these dynamics, anyone making such value judgements is culpable.  If they’re in a leadership position, I’d suggest that any utterances in that realm indicate either incompetence or criminal intent. Don’t do that.

Don’t accept it.  Don’t ignore it.  Don’t pander to it.  Don’t vote for it.

Speak up and out against it.  For everybody’s sake.  Because I assure you, every single example in history shows that it comes back to bite everyone.  So even if you don’t really feel a connection with people you don’t know, it’ll come and bite you and yours, too.

It’s rearing its ugly head, again, and if we ignore it it will bite us. Badly. Just like any previous time in history.  Guaranteed.

The post The trouble with group labels. Because. first appeared on Lentz family blog.

,

BlueHackersPost-work: the radical idea of a world without jobs | The Guardian

,

Tim SerongYou won’t find us on Facebook

I made these back in August 2016 (complete with lovingly hand-drawn thumb and middle finger icons), but it seems appropriate to share them again now. The images are CC-BY-SA, so go nuts, or you can grab them in sticker form from Redbubble.

facebook-thumb-blue

facebook-thumb-black

facebook-finger-blue

facebook-finger-black

,

Arjen LentzMaximising available dynamic memory on Arduino

I like programming in small spaces, it makes one think efficiently and not be sloppy.

Example: I was looking at an embedded device the other day, and found a complete Tomcat server and X desktop running on a Raspberry Pi type environment.  So that kind of stuff makes me shudder, it just looks wrong and to me it shows that whoever put that together (it was a company product, not just a hobby project) is not really “thinking” embedded.  Yes we have more powerful embedded CPUs available now, and more memory, but that all costs power.  So really: code efficiency = power efficiency!

Back to Arduino, a standard Arduino Uno type board (Atmel ATMEGA 328P) has 32K of program storage space, and 2K of dynamic memory.  The latter is used for runtime variables and the stack, so you want to be sure you always have enough spare there, particularly when using some libraries that need a bit of working space.

A bit of background… while in “big” CPUs the program and variable memory space is generally shared, it’s often separated in microcontrollers.  This makes sense considering the architecture: the program code is flashed and doesn’t change, while the variable space needs to be written at runtime.

The OneWire library, used to interface with for instance Maxim one-wire sensors, applies a little lookup table to do its CRC (checksum) calculation.  In this case it’s optimising for time, as using a table lookup CRC is much faster than just calculating for each byte.  But, it does optimise the table, by using the PROGMEM modifier when declaring the static array.  It can be optimised further still, see my post on reducing a CRC lookup table from 256 entries to 32 entries (I submitted a patch and pull request).  What PROGMEM does is tell the compiler to put the array in program storage rather than the dynamic variable space.

Arduino uses the standard GNU C++ compiler to cross-compile for the Atmel chips (that form the core of most Arduino boards), and this compiler “normally” just puts any variable (even if “const” – constant) with the rest of the variables.  This way of organising is perfectly sensible except with these microcontrollers.  So this is why we need to give the compiler a hint that certain variables can be placed in program memory instead!

Now consider this piece of code:

Serial.println("Hello, world!");

Where does the “Hello, world!” string get stored by default?  In the variable space!

I hadn’t really thought about that, until I ran short of dynamic memory in the controller of my hot water system.  The issue actually manifested itself by causing incorrect temperature readings, and sometimes crashing.  When looking at it closer, it became clear that the poor thing was running out of memory and overwriting other stuff or otherwise just getting stuck.  The controller used is an Freetronics EtherTen, which is basically an Arduino Uno with an Ethernet breakout integrated on the same board.  Using Ethernet with a Uno gets really tight, but it can be very beneficial: the controller is powered through 802.3f PoE, and communicates using UDP packets.

I knew that I wasn’t actually using that many actual variables, but was aware that the Ethernet library does require a fair bit of space (unfortunately the “how much” is not documented, so I was just running on a “as much as possible”).  Yet after compiling I was using in the range of 1K of the dynamic variable space, which just looked like too much.  So that’s when I started hunting, thought it over in terms of how I know compilers think, and then found the little note near the bottom of the PROGMEM page, explaining that you can use the F() macro on static strings to also place them in program storage. Eureka!

Serial.println(F("Hello, world!"));

When you compile an Arduino sketch, right at the end you get to see how much memory is used. Below is the output on a tiny sketch that only prints the hello world as normal:

Sketch uses 1374 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 202 bytes (9%) of dynamic memory, leaving 1846 bytes for local variables. Maximum is 2048 bytes.

If you use the F() modifier for this example, you get:

Sketch uses 1398 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 188 bytes (9%) of dynamic memory, leaving 1860 bytes for local variables. Maximum is 2048 bytes.

The string is 13 bytes long plus its ‘\0’ terminator, and indeed we see that the 14 bytes have shifted from dynamic memory to program storage. Great score, because we know that that string can’t be modified anyway so we have no reason to have it in dynamic variable space.  It’s really a constant.

Armed with that new wisdom I went through my code and changed the serial monitoring strings to use the F() modifier.  That way I “recovered” well over 200 bytes of dynamic variable space, which should provide the Ethernet library with plenty of room!

Now, you may wonder, “why doesn’t he just use an #ifdef to remove the serial debugging code?”.  I could do that, but then I really have no way to easily debug the system on this hardware platform, as it wouldn’t work with the debugging code enabled (unless the Ethernet library is not active).  So that wouldn’t be a winner.  This approach works, and now with the extra knowledge of F() for string constants I have enough space to have the controller do everything it needs to.

Finally, while I did an OSDC conference talk on this solar hot water controller (video on this the PV system in my home) some years ago already, I realise I hadn’t actually published the code.  Done now, arjenlentz/HomeResourceMonitor on GitHub.  It contains some hardcoded stuff (mostly #define macros) for our family situation, but the main story is that the automatic control of the boost saves us a lot of money.  And of course, the ability to see what goes on (feedback) is an important factor for gaining understanding, which in turn leads to adjusting behaviour.

The post Maximising available dynamic memory on Arduino first appeared on Lentz family blog.

,

Craig Sandersbrawndo-installer

Tired of being oppressed by the slack-arse distro package maintainers who waste time testing that new versions don’t break anything and then waste even more time integrating software into the system?

Well, so am I. So I’ve fixed it, and it was easy to do. Here’s the ultimate installation tool for any program:

brawndo() {
   curl $1 | sudo /usr/bin/env bash -
}

I’ve never written a shell script before in my entire life, I spend all my time writing javascript or ruby or python – but shell’s not a real language so it can’t be that hard to get right, can it? Of course not, and I just proved it with the amazing brawndo installer (It’s got what users crave – it’s got electrolyes!)

So next time some lame sysadmin recommends that you install the packaged version of something, just ask them if apt-get or yum or whatever loser packaging tool they’re suggesting has electrolytes. That’ll shut ’em up.

brawndo-installer is a post from: Errata

,

Chris SamuelVale Dad

[I’ve been very quiet here for over a year for reasons that will become apparent in the next few days when I finish and publish a long post I’ve been working on for a while – difficult to write, hence the delay]

It’s 10 years ago today that my Dad died, and Alan and I lost the father who had meant so much to both of us. It’s odd realising that it’s over 1/5th of my life since he died, it doesn’t seem that long.

Vale dad, love you…

This item originally posted here:

Vale Dad

,

Tim SerongStrange Bedfellows

The Tasmanian state election is coming up in a week’s time, and I’ve managed to do a reasonable job of ignoring the whole horrible thing, modulo the promoted tweets, the signs on the highway, the junk the major (and semi-major) political parties pay to dump in my letterbox, and occasional discussions with friends and neighbours.

Promoted tweets can be blocked. The signs on the highway can (possibly) be re-purposed for a subsequent election, or can be pulled down and used for minor windbreak/shelter works for animal enclosures. Discussions with friends and neighbours are always interesting, even if one doesn’t necessarily agree. I think the most irritating thing is the letterbox junk; at best it’ll eventually be recycled, at worst it becomes landfill or firestarters (and some of those things do make very satisfying firestarters).

Anyway, as I live somewhere in the wilds division of Franklin, I thought I’d better check to see who’s up for election here. There’s no independents running this time, so I’ve essentially got the choice of four parties; Shooters, Fishers and Farmers Tasmania, Tasmanian Greens, Tasmanian Labor and Tasmanian Liberals (the order here is the same as on the TEC web site; please don’t infer any preference based on the order in which I list parties in this blog post).

I feel like I should be setting party affiliations aside and voting for individuals, but of the sixteen candidates listed, to the best of my knowledge I’ve only actually met and spoken with two of them. Another I noticed at random in a cafe, and I was ignored by a fourth who was milling around with some cronies at a promotional stand out the front of Woolworths in Huonville a few weeks ago. So, party affiliations it is, which leads to an interesting thought experiment.

When you read those four party names above, what things came most immediately to mind? For me, it was something like this:

  • Shooters, Fishers & Farmers: Don’t take our guns. Fuck those bastard Greenies.
  • Tasmanian Greens: Protect the natural environment. Renewable energy. Try not to kill anything. Might collaborate with Labor. Liberals are big money and bad news.
  • Tasmanian Labor: Mellifluous babble concerning health, education, housing, jobs, pokies and something about workers rights. Might collaborate with the Greens. Vehemently opposed to the Liberals.
  • Tasmanian Liberals: Mellifluous babble concerning jobs, health, infrastructure, safety and the Tasmanian way of life, peppered with something about small business and family values. Vehemently opposed to Labor and the Greens.

And because everyone usually automatically thinks in terms of binaries (e.g. good vs. evil, wrong vs. right, one vs. zero), we tend to end up imagining something like this:

  • Shooters, Fishers & Farmers vs. Greens
  • Labor vs. Liberal
  • …um. Maybe Labor and the Greens might work together…
  • …but really, it’s going to be Labor or Liberal in power (possibly with some sort of crossbench or coalition support from minor parties, despite claims from both that it’ll be majority government all the way).

It turns out that thinking in binaries is remarkably unhelpful, unless you’re programming a computer (it’s zeroes and ones all the way down), or are lost in the wilderness (is this plant food or poison? is this animal predator or prey?) The rest of the time, things tend to be rather more colourful (or grey, depending on your perspective), which leads back to my thought experiment: what do these “naturally opposed” parties have in common?

According to their respective web sites, the Shooters, Fishers & Farmers and the Greens have many interests in common, including agriculture, biosecurity, environmental protection, tourism, sustainable land management, health, education, telecommunications and addressing homelessness. There are differences in the policy details of course (some really are diametrically opposed), but in broad strokes these two groups seem to care strongly about – and even agree on – many of the same things.

Similarly, Labor and Liberal are both keen to tell a story about putting the people of Tasmania first, about health, education, housing, jobs and infrastructure. Honestly, for me, they just kind of blend into one another; sure there’s differences in various policy details, but really if someone renamed them Labal and Liberor I wouldn’t notice. These two are the status quo, and despite fighting it out with each other repeatedly, are, essentially, resting on their laurels.

Here’s what I’d like to see: a minority Tasmanian state government formed from a coalition of the Tasmanian Greens plus the Shooters, Fishers & Farmers party, with the Labor and Liberal parties together in opposition. It’ll still be stuck in that irritating Westminster binary mode, but at least the damn thing will have been mixed up sufficiently that people might actually talk to each other rather than just fighting.

,

Jonathan AdamczewskiWatch as the OS rewrites my buggy program.

I didn’t know that SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT) was a thing, until I wrote a bad test that wouldn’t crash.

Digging into it, I found that a movaps instruction was being rewritten as movups, which was a thoroughly confusing thing to see.

The one clue I had was that a fault due to an unaligned load had been observed in non-test code, but did not reproduce when written as a test using the google-test framework. A short hunt later (including a failed attempt at writing a small repro case), I found an explanation: google test suppresses this class of failure.

The code below will successfully demonstrate the behavior, printing out the SIMD load instruction before and after calling the function with an unaligned pointer.

[Gist]

View the code on Gist.

,

,

Jonathan AdamczewskiPriorities for my team

(unthreaded from here)

During the day, I’m a Lead of a group of programmers. We’re responsible for a range of tools and tech used by others at the company for making games.

I have a list of the my priorities (and some related questions) of things that I think are important for us to be able to do well as individuals, and as a team:

  1. Treat people with respect. Value their time, place high value on their well-being, and start with the assumption that they have good intentions
    (“People” includes yourself: respect yourself, value your own time and well-being, and have confidence in your good intentions.)
  2. When solving a problem, know the user and understand their needs.
    • Do you understand the problem(s) that need to be solved? (it’s easy to make assumptions)
    • Have you spoken to the user and listened to their perspective? (it’s easy to solve the wrong problem)
    • Have you explored the specific constraints of the problem by asking questions like:
      • Is this part needed? (it’s easy to over-reach)
      • Is there a satisfactory simpler alternative? (actively pursue simplicity)
      • What else will be needed? (it’s easy to overlook details)
    • Have your discussed your proposed solution with users, and do they understand what you intend to do? (verify, and pursue buy-in)
    • Do you continue to meet regularly with users? Do they know you? Do they believe that you’re working for their benefit? (don’t under-estimate the value of trust)
  3. Have a clear understanding of what you are doing.
    • Do you understand the system you’re working in? (it’s easy to make assumptions)
    • Have you read the documentation and/or code? (set yourself up to succeed with whatever is available)
    • For code:
      • Have you tried to modify the code? (pull a thread; see what breaks)
      • Can you explain how the code works to another programmer in a convincing way? (test your confidence)
      • Can you explain how the code works to a non-programmer?
  4. When trying to solve a problem, debug aggressively and efficiently.
    • Does the bug need to be fixed? (see 1)
    • Do you understand how the system works? (see 2)
    • Is there a faster way to debug the problem? Can you change code or data to cause the problem to occur more quickly and reliably? (iterate as quickly as you can, fix the bug, and move on)
    • Do you trust your own judgement? (debug boldly, have confidence in what you have observed, make hypotheses and test them)
  5. Pursue excellence in your work.
    • How are you working to be better understood? (good communication takes time and effort)
    • How are you working to better understand others? (don’t assume that others will pursue you with insights)
    • Are you responding to feedback with enthusiasm to improve your work? (pursue professionalism)
    • Are you writing high quality, easy to understand, easy to maintain code? How do you know? (continue to develop your technical skills)
    • How are you working to become an expert and industry leader with the technologies and techniques you use every day? (pursue excellence in your field)
    • Are you eager to improve (and fix) systems you have worked on previously? (take responsibility for your work)

The list was created for discussion with the group, and as an effort to articulate my own expectations in a way that will help my team understand me.

Composing this has been useful exercise for me as a lead, and definitely worthwhile for the group. If you’ve never tried writing down your own priorities, values, and/or assumptions, I encourage you to try it :)

,

Jonathan AdamczewskiA little bit of floating point in a memory allocator — Part 2: The floating point

[Previously]

This post contains the same material as this thread of tweets, with a few minor edits.

In IEEE754, floating point numbers are represented like this:

±2ⁿⁿⁿ×1.sss…

nnn is the exponent, which is floor(log2(size)) — which happens to be the fl value computed by TLSF.

sss… is the significand fraction: the part that follows the decimal point, which happens to be sl.

And so to calculate fl and sl, all we need to do is convert size to a floating point value (on recent x86 hardware, that’s a single instruction). Then we can extract the exponent, and the upper bits of the fractional part, and we’re all done :D

That can be implemented like this:

double sf = (int64_t)size;
uint64_t sfi;
memcpy(&sfi, &sf, 8);
fl = (sfi >> 52) - (1023 + 7);
sl = (sfi >> 47) & 31;

There’s some subtleties (there always is). I’ll break it down…

double sf = (int64_t)size;

Convert size to a double, with an explicit cast. size has type size_t, but using TLSF from github.com/mattconte/tlsf, the largest supported allocation on 64bit architecture is 2^32 bytes – comfortably less than the precision provided by the double type. If you need your TLSF allocator to allocate chunks bigger than 2^53, this isn’t the technique for you :)

I first tried using float (not double), which can provide correct results — but only if the rounding mode happens to be set correctly. double is easier.

The cast to (int64_t) results in better codegen on x86: without it, the compiler will generate a full 64bit unsigned conversion, and there is no single instruction for that.

The cast tells the compiler to (in effect) consider the bits of size as if they were a two’s complement signed value — and there is an SSE instruction to handle that case (cvtsi2sdq or similar). Again, with the implementation we’re using size can’t be that big, so this will do the Right Thing.

uint64_t sfi;
memcpy(&sfi, &sf, 8);

Copy the 8 bytes of the double into an unsigned integer variable. There are a lot of ways that C/C++ programmers copy bits from floating point to integer – some of them are well defined :) memcpy() does what we want, and any moderately respectable compiler knows how to select decent instructions to implement it.

Now we have floating point bits in an integer register, consisting of one sign bit (always zero for this, because size is always positive), eleven exponent bits (offset by 1023), and 52 bits of significant fraction. All we need to do is extract those, and we’re done :)

fl = (sfi >> 52) - (1023 + 7);

Extract the exponent: shift it down (ignoring the always-zero sign bit), subtract the offset (1023), and that 7 we saw earlier, at the same time.

sl = (sfi >> 47) & 31;

Extract the five most significant bits of the fraction – we do need to mask out the exponent.

And, just like that*, we have mapping_insert(), implemented in terms of integer -> floating point conversion.

* Actual code (rather than fragments) may be included in a later post…

Jonathan AdamczewskiA little bit of floating point in a memory allocator — Part 1: Background

This post contains the same material as this thread of tweets, with a few minor edits.

Over my holiday break at the end of 2017, I took a look into the TLSF (Two Level Segregated Fit) memory allocator to better understand how it works. I’ve made use of this allocator and have been impressed by its real world performance, but never really done a deep dive to properly understand it.

The mapping_insert() function is a key part of the allocator implementation, and caught my eye. Here’s how that function is described in the paper A constant-time dynamic storage allocator for real-time systems:

I’ll be honest: from that description, I never developed a clear picture in my mind of what that function does.

(Reading it now, it seems reasonably clear – but I can say that only after I spent quite a bit of time using other methods to develop my understanding)

Something that helped me a lot was by looking at the implementation of that function from github.com/mattconte/tlsf/.  There’s a bunch of long-named macro constants in there, and a few extra implementation details. If you collapse those it looks something like this:

void mapping_insert(size_t size, int* fli, int* sli)
{ 
  int fl, sl;
  if (size < 256)
  {
    fl = 0;
    sl = (int)size / 8;
  }
  else
  {
    fl = fls(size);
    sl = (int)(size >> (fl - 5)) ^ 0x20;
    fl -= 7;
  }
  *fli = fl;
  *sli = sl;
}

It’s a pretty simple function (it really is). But I still failed to *see* the pattern of results that would be produced in my mind’s eye.

I went so far as to make a giant spreadsheet of all the intermediate values for a range of inputs, to paint myself a picture of the effect of each step :) That helped immensely.

Breaking it down…

There are two cases handled in the function: one for when size is below a certain threshold, and on for when it is larger. The first is straightforward, and accounts for a small number of possible input values. The large size case is more interesting.

The function computes two values: fl and sl, the first and second level indices for a lookup table. For the large case, fl (where fl is “first level”) is computed via fls(size) (where fls is short for “find last set” – similar names, just to keep you on your toes).

fls() returns the index of the largest bit set, counting from the least significant slbit, which is the index of the largest power of two. In the words of the paper:

“the instruction fls can be used to compute the ⌊log2(x)⌋ function”

Which is, in C-like syntax: floor(log2(x))

And there’s that “fl -= 7” at the end. That will show up again later.

For the large case, the computation of sl has a few steps:

  sl = (size >> (fl – 5)) ^ 0x20;

Depending on shift down size by some amount (based on fl), and mask out the sixth bit?

(Aside: The CellBE programmer in me is flinching at that variable shift)

It took me a while (longer than I would have liked…) to realize that this
size >> (fl – 5) is shifting size to generate a number that has exactly six significant bits, at the least significant end of the register (bits 5 thru 0).

Because fl is the index of the most significant bit, after this shift, bit 5 will always be 1 – and that “^ 0x20” will unset it, leaving the result as a value between 0 and 31 (inclusive).

So here’s where floating point comes into it, and the cute thing I saw: another way to compute fl and sl is to convert size into an IEEE754 floating point number, and extract the exponent, and most significant bits of the mantissa. I’ll cover that in the next part, here.

,

Tim SerongHackweek0x10: Fun in the Sun

We recently had a 5.94KW solar PV system installed – twenty-two 270W panels (14 on the northish side of the house, 8 on the eastish side), with an ABB PVI-6000TL-OUTD inverter. Naturally I want to be able to monitor the system, but this model inverter doesn’t have an inbuilt web server (which, given the state of IoT devices, I’m actually kind of happy about); rather, it has an RS-485 serial interface. ABB sell addon data logger cards for several hundred dollars, but Rick from Affordable Solar Tasmania mentioned he had another client who was doing monitoring with a little Linux box and an RS-485 to USB adapter. As I had a Raspberry Pi 3 handy, I decided to do the same.

Step one: Obtain an RS-485 to USB adapter. I got one of these from Jaycar. Yeah, I know I could have got one off eBay for a tenth the price, but Jaycar was only a fifteen minute drive away, so I could start immediately (I later discovered various RS-485 shields and adapters exist specifically for the Raspberry Pi – in retrospect one of these may have been more elegant, but by then I already had the USB adapter working).

Step two: Make sure the adapter works. It can do RS-485 and RS-422, so it’s got five screw terminals: T/R-, T/R+, RXD-, RXD+ and GND. The RXD lines can be ignored (they’re for RS-422). The other three connect to matching terminals on the inverter, although what the adapter labels GND, the inverter labels RTN. I plugged the adapter into my laptop, compiled Curt Blank’s aurora program, then asked the inverter to tell me something about itself:

aurora -a 2 -Y 4 -e /dev/ttyUSB0Interestingly, the comms seem slightly glitchy. Just running aurora -a 2 -e /dev/ttyUSB0 always results in either “No response after 1 attempts” or “CRC receive error (1 attempts made)”. Adding “-Y 4” makes it retry four times, which is generally rather more successful. Ten retries is even more reliable, although still not perfect. Clearly there’s some tweaking/debugging to do here somewhere, but at least I’d confirmed that this was going to work.

So, on to the Raspberry Pi. I grabbed the openSUSE Leap 42.3 JeOS image and dd’d that onto a 16GB SD card. Booted the Pi, waited a couple of minutes with a blank screen while it did its firstboot filesystem expansion thing, logged in, fiddled with network and hostname configuration, rebooted, and then got stuck at GRUB saying “error: attempt to read or write outside of partition”:

error: attempt to read or write outside of partition.

Apparently that’s happened to at least one other person previously with a Tumbleweed JeOS image. I fixed it by manually editing the partition table.

Next I needed an RPM of the aurora CLI, so I built one on OBS, installed it on the Pi, plugged the Pi into the USB adapter, and politely asked the inverter to tell me a bit more about itself:

aurora -a @ -Y 4 -d 0 /dev/ttyUSB0

Everything looked good, except that the booster temperature was reported as being 4294967296°C, which seemed a little high. Given that translates to 0x100000000, and that the south wall of my house wasn’t on fire, I rather suspected another comms glitch. Running aurora -a 2 -Y 4 -d 0 /dev/ttyUSB0 a few more times showed that this was an intermittent problem, so it was time to make a case for the Pi that I could mount under the house on the other side of the wall from the inverter.

I picked up a wall mount snap fit black plastic box, some 15mm x 3mm screws, matching nuts, and 9mm spacers. The Pi I would mount inside the box part, rather than on the back, meaning I can just snap the box-and-Pi off the mount if I need to bring it back inside to fiddle with it.

Then I had to measure up and cut holes in the box for the ethernet and USB ports. The walls of the box are 2.5mm thick, plus 9mm for the spacers meant the bottom of the Pi had to be 11.5mm from the bottom of the box. I measured up then used a Dremel tool to make the holes then cleaned them up with a file. The hole for the power connector I did by eye later after the board was in about the right place.

20171115_164538 20171115_165407 20171115_165924 20171115_172026 20171115_173200 20171115_174705 20171115_174822 20171115_175002

I didn’t measure for the screw holes at all, I simply drilled through the holes in the board while it was balanced in there, hanging from the edge with the ports. I initially put the screws in from the bottom of the box, dropped the spacers on top, slid the Pi in place, then discovered a problem: if the nuts were on top of the board, they’d rub up against a couple of components:

20171115_180310

So I had to put the screws through the board, stick them there with Blu Tack, turn the Pi upside down, drop the spacers on top, and slide it upwards into the box, getting the screws as close as possible to the screw holes, flip the box the right way up, remove the Blu Tack and jiggle the screws into place before securing the nuts. More fiddly than I’d have liked, but it worked fine.

One other kink with this design is that it’s probably impossible to remove the SD card from the Pi without removing the Pi from the box, unless your fingers are incredibly thin and dexterous. I could have made another hole to provide access, but decided against it as I’m quite happy with the sleek look, this thing is going to be living under my house indefinitely, and I have no plans to replace the SD card any time soon.

20171115_18265520171115_192923

All that remained was to mount it under the house. Here’s the finished install:

20171116_115413

After that, I set up a cron job to scrape data from the inverter every five minutes and dump it to a log file. So far I’ve discovered that there’s enough sunlight by about 05:30 to wake the inverter up. This morning we’d generated 1KW by 08:35, 2KW by 09:10, 8KW by midday, and as I’m writing this at 18:25, a total of 27.134KW so far today.

Next steps:

  1. Figure out WTF is up with the comms glitches
  2. Graph everything and/or feed the raw data to pvoutput.org

,

Clinton RoyAccess and Memory: Open GLAM and Open Source

Over the years of my involvement with library projects, like Coder Dojo, programming workshops and such, I’ve struggled to nail down the intersection between libraries and open source. At this years linux.conf.au in Sydney (my seventeenth!) I’m helping to put together a miniconf to answer this question: Open GLAM. If you do work in the intersection of galleries, libraries, archives, musuems and open source, we’d love to hear from you.

,

James BrombergerWeb Security 2017

Stronger encryption requirements for PCI compliance is having a good effect on purging the scourge of the web: legacy browsers, and as they disappear comes even more capability client side for security.

,

Arjen LentzUsing expired credit cards

Using expired credit/debit cards… surely you can’t do that?  Actually, yes you can. This is how it goes.

First, what can be observed (verified):

On a vendor site that allows your to save your card (hopefully via a token with the payment gateway provider, so it doesn’t actually store your card), you enter the card number and expiry date for your then still valid card.  This is necessary because otherwise the site is likely to reject your input.  Makes sense.

Some time later your card expires, but the vendor is still quite happy to keep using the card-on-file for recurring payments.  The payment gateway apparently doesn’t mind, and our banks apparently don’t mind.  I have observed this effect with Suncorp and Bank of Queensland, please let me know if you’ve observed this with other banks.

From this point, let’s play devil’s advocate.

  • What if someone got hold of your card number+expiry date?
    • Well, sites tend to reject dates-in-the-past on input. Excellent.
  • What if that someone just does +4 on the year and then enters it – renewed cards tend to have the same number, just with an updated expiry 4 years in to the future (the exact number of years may differ between banks) ?
    • Payment gateway should reject the card, because even though the card+expiry is “ok”, the CVV (Card Verification Value, the magic number on the back of the card) would be different!  Nice theory, but…
      • I’ve noted that some sites don’t ask for the CVV, and thus we must conclude at at least some payment gateways don’t require it.  Eek!
        I noticed that the payment gateway for one of these was Westpac.

So what are the underlying issues:

  • Banks let through payments on expired cards.
    • Probably done for client convenience (otherwise you’d be required to update lots of places).
  • Banks issue new cards with the same card number but just an updated year (even the month tends to be the same).
    • Possibly convenience again, but if you need to update your details anyway with some vendor, you might as well update a few more numbers.  I don’t see a valid reason to do this (please comment if you think of something).
  • Some payment gateways don’t require CVV to let through a payment.
    • This is inexcusable and means that the above two habits result in a serious fraud vector.  Payment gateways, credit card companies and banks should not allow this at all, yet somehow it goes through the gateway -> credit card company path without getting rejected.

Security tends to involve multiple layers.  This makes sense, as any one layer can be compromised.  When a security aspect is procedurally compromised, such as not regarding an expired card as expired, or not requiring the o-so-important CVV number for online payments, it’s the vendor itself undoing their security.  If that happens with a few layers, as in the above scenario, security is fatally impacted.  A serious failing.

I have little doubt that people have been using this fraud vector some time as it’s unlikely that I’m the first one spotting this.  In many scenarios, credit card companies tend to essentially weigh security risks against convenience, and refund those affected.  This is what happens with abuse of the PayWave system, and while I don’t really like it, I understand why they do this.  But I also think we have to draw the line somewhere.  Not requiring CVV numbers for online transactions is definitely beyond. Possibly renewing cards with the same number also.  And as it’s the combination of these factors that causes the problem, addressing any one of them could plug the hole – addressing more than one would be great.

The post Using expired credit cards first appeared on Lentz family blog.

,

Arjen LentzCalculating CRC with a tiny (32 entry) lookup-table

I happened to notice that the Arduino OneWire library uses a 256 entry lookup table for its CRC calculations.  I did some research on this topic in 1992-1993, while working on Bulletin Board Systems, FidoNet code and file transfer protocols.  These days memory is not at a premium on most computers, however on Arduino and microcontroller environments it definitely is, and I happen to know that table-lookup CRC can be done using two 16-entry tables!  So I’ve dug up my documentation and code from the time, and applied it to the CRC-8 calculation for the Maxim (Dallas Semiconductor) OneWire bus protocol.  I think this provides a neat trade-off between code size and speed.

License

For any of the below code, apply the following license (2-clause “simplified” BSD license), which should suffice for any use.  If you do require another license, just ask.

CRC lookup tables, generator and use implementation by Arjen Lentz <arjen (at) lentz (dot) com (dot) au>

Copyright (C) 1992-2017 Arjen Lentz

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The Code

The below is a drop-in replacement for the chunk of code in OneWire.cpp, the two 16-entry tables I mentioned above are combined in to a single 32 entry array.

Copyright (c) 2007, Jim Studt  (original old version - many contributors since)

The latest version of this library may be found at:
  http://www.pjrc.com/teensy/td_libs_OneWire.html

OneWire has been maintained by Paul Stoffregen (paul@pjrc.com) since
January 2010.
[...]

// The 1-Wire CRC scheme is described in Maxim Application Note 27:
// "Understanding and Using Cyclic Redundancy Checks with Maxim iButton Products"
//

#if ONEWIRE_CRC8_TABLE
// Dow-CRC using polynomial X^8 + X^5 + X^4 + X^0
static const uint8_t PROGMEM dscrc_table[] = {
    0x00, 0x5E, 0xBC, 0xE2, 0x61, 0x3F, 0xDD, 0x83,
    0xC2, 0x9C, 0x7E, 0x20, 0xA3, 0xFD, 0x1F, 0x41,
    0x00, 0x9D, 0x23, 0xBE, 0x46, 0xDB, 0x65, 0xF8,
    0x8C, 0x11, 0xAF, 0x32, 0xCA, 0x57, 0xE9, 0x74
};

//
// Compute a Dallas Semiconductor 8 bit CRC. These show up in the ROM
// and the registers.
uint8_t OneWire::crc8(const uint8_t *addr, uint8_t len)
{
    uint8_t crc = 0;

    while (len--) {
        crc = *addr++ ^ crc;  // just re-using crc as intermediate
        crc = pgm_read_byte(dscrc_table + (crc & 0x0f)) ^
              pgm_read_byte(dscrc_table + 16 + ((crc >> 4) & 0x0f));
    }
    return crc;
}
#else
[...]

Generating CRC tables

CRC tables are always only presented as magic output, but the feedback terms simply represent the results of eight shifts/xor operations for all combinations of data and CRC register values. Actually, the table generator routine uses the old-style method shifting each bit.

Below is the rough code. For the Dow-CRC, the specified polynomial is X^8 + X^5 + X^4 + X^0. The first term tells us that it’s an 8 bit CRC. We work the rest in to the designate bits but from right to left, so we end up with binary 1000 1100 which is 0x8C in hex. This is the POLY value in the code.

for (i = 0; i <= 15; i++) {
    crc = i;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

for (i = 0; i <= 15; i++) {
    crc = i << 4;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ poly;
        else
            crc >>= 1;
    }
    crctab[16 + i] = crc;
}

If you want to generate a 256-entry table for the same:

for (i = 0; i <= 255; i++) {
    crc = i;
    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

Yes, this trickery basically works for any CRC (any number of bits), you just need to put in the correct polynomial.  However, some CRCs are “up side down”, and thus require a bit of adjustment along the way – the CRC-16 as used by Xmodem and Zmodem worked that way, but I won’t bother you with that old stuff here as it’s not really used any more. Some CRCs also use different initialisation values, post-processing and check approaches. If you happen to need info on that, drop me a line as the info contained in this blog post is just a part of the docu I have here.

CRC – Polynomial Division

To calculate a CRC (Cyclic Redundancy Check) of a block of data, the data bits are considered to be the coefficients of a polynomial. This message (data) polynomial is first multiplied by the highest term in the polynomial (X^8, X^16 or X^32) then divided by the generator polynomial using modulo two arithemetic. The remainder left after the division is the desired CRC.

CRCs are usually expressed as an polynomial expression such as:

X^16 + X^12 + X^5 + 1

The generator polynomial number is determined by setting bits corresponding to the power terms in the polynomial equation in an (unsigned) integer. We do this backwards and put the highest-order term in the the lowest-order bit. The highest term is implied (X^16 here, just means its’s a 16-bit CRC), the LSB is the X^15 term (0 here), the X^0 term (shown as + 1) results in the MSB being 1.

Note that the usual hardware shift register implementation shifts bits into the lowest-order term. In our implementation, that means shifting towards the right. Why do we do it this way? Because the calculated CRC must be transmitted across a serial link from highest- to lowest-order term. UARTs transmit characters in order from LSB to MSB. By storing the CRC this way, we hand it to the UART in the order low-byte to high-byte; the UART sends each low-bit to high-bit; and the result is transmission bit by bit from highest- to lowest-order term without requiring any bit shuffling.

Credits

Stuff found on BBS, numerous files with information and implementations.
Comments in sources on CRC-32 calculation by Gary S. Brown.
All other people hereby credited anonymously, because unfortunately the documents and sources encountered did not contain the name of a specific person or organisation. But, each source provided a piece of the puzzle!

Sources / Docs

My original source code and full docu is also available: crc_agl.zip, as part of my archive of Lentz Software-Development‘s work in the 1990s.

The post Calculating CRC with a tiny (32 entry) lookup-table first appeared on Lentz family blog.