Planet Linux Australia

,

Adrian ChaddFiguring out how Amiga 500 RAM expansion projects work

 (Wow, what a shift in topics here..)

Yeah I picked up an Amiga 500. Specifically, a rev6 amiga 500. It has 4 extra spots on the mainboard for another 512k of chip ram that isn't populated.

Crap I'm going to have to explain chip RAM.

Ok, so the Amiga is a pretty nifty architecture, especially if you realise it was designed in the early 80s. I won't go into it all because that's not the point of the article and that's covered on the internet.

What I will go into is what I've figured out with the RAM expansions.

So - the RAM types. This is the bits I knew back as a teenager.

The Amiga has three kinds of RAM:

  • Fast RAM is RAM that the CPU can see, but the custom chipsets can't address. It's not slowed down by any contention with the chipset access to its RAM.
  • Chip RAM is RAM that the CPU and the custom chipsets can see. This RAM sits /behind/ the custom chipsets; access to it from the CPU is lower priority than from said chipsets.
  • Slow RAM is RAM that also sits behind the custom chipsets, but for reasons internal to said chipsets they can't actively use. If the custom chipsets are using chip RAM for functions then the CPU will also be blocked.

The reason for chip/slow RAM on the early Amigas only makes sense if you peer into Agnus, the DMA and RAM arbiter chipset. It has a variety of features, but it's basically a big RAM arbiter and DMA engine. The earliest version in the Amiga 1000 supported DMAing to/from 512KB of RAM. However, it supports controlling 1MB of RAM. The next major version update (which I have) supports DMA from and to 1MB of RAM.

Now, because of this early limitation, the Amiga designers put the first 512k that Agnus can control early in the memory map (at $000000), and moved that second 512k of RAM much later in the memory map (address $C00000). That way a future chip that supported more DMA accessible RAM would just extend the chip RAM memory map in a contiguous fashion.

However, this meant a bunch of software was written that assumes the first 512k is at $000000 and the second 512K is at $C00000. Not a lot, but some.

My goal is to have 1MB of chip RAM and at least 512k of slow RAM. That should give the maximum compatibility with all of the Amiga 500 software in the past and set things up pretty well for the future.

Now, this is something people have already done. PeteAU on the Amiga Forums kickstarted a bunch of designs which were debugged and improved upon by forum members. Let's talk about how it works.

First up, Agnus decodes a bunch of address lines to know what the CPU is requesting. However, it only treats things as a memory access if Gary, another custom chip, decoded the address bus and said "yup, this is a RAM access." Gary asserts /RAMEN to Agnus to let Agnus know it's a RAM access and Agnus takes care of doing the memory transfer.

JP2 on the Amiga 500 rev6 board chooses whether Agnus sees the CPU A19 line as A19 or A23. If it's A23, the default, then the second half of Agnus' address bus as viewed by the CPU is at the slow RAM address of $C00000. But if JP2 is cut and the other link is joined, it becomes A19 and both 512KB RAM blocks show up as contiguous RAM at $000000, as chip RAM.

Agnus decodes its A19 line to select which of two 512KB RAM blocks to enable. It does this using the /RAS0 and /RAS1 lines. /RAS0 enables the onboard 512KB RAM. /RAS1 enables the unpopulated 512KB RAM chips on the mainboard. Also, BOTH /RAS0 and /RAS1 appear at the trapdoor RAM expansion underneath the Amiga 500, and the Amiga 501 512KB RAM expansion uses /RAS1 by default.

So, you can easily choose between 1MB of chip RAM and a 512KB chip/512KB slow by changing JP2. That's easy.

JP7 controls whether Gary sees the /EXRAM line from the trapdoor expansion. Gary uses this to control which address ranges are treated as RAM and sent as an enable line to Agnus via /RAMEN.

JP3 controls which 512KB bank is /RAS0 or /RAS1. It lets you swap them, so the trapdoor RAM and onboard RAM sees /RAS0 as /RAS1 and vice versa. You normally shouldn't have to fiddle with this.

Finally, Agnus is responsible for managing the DRAM chips. This requires:

  • Managing the multiplexed column/row address bus setup that DRAM chips use, so it has to latch rows and then columns for reading
  • Asserting the right /RAS and /CAS lines to the right chips (important for normal AND RAM expansion hacks!)
  • Handling DRAM refresh.

Ok, so now that this is done, let's talk about PeteAU's RAM expansion and how they managed to get 1MB of chip RAM and lots of slow RAM.

There are two halves - the expansion RAM board and the "Gary adapter". The easiest to talk about is the expansion RAM board, so that goes first.

The RAM expansion boards still take /RAS0 and /RAS1, and provide jumpers to which ends up at which RAM. Ok, so how does it actually enable different RAM banks?

The two wires that go to the RAM expansion boards are either A19/A20 or some decoded magic that I'll describe later. These feed into a 74LS139 which is a dual 2-to-4 demux. Each demux is responsible - this is important - for demuxing the /CAS lines. So yes, even if all the expansion RAM is using /RAS1, the /CAS lines get enabled only for the RAM that matters. It's important to understand this particular bit of magic. Agnus only has two /CAS lines - one for the lower 8 bits and one for the upper 8 bits of the 16 bit data bus. So all of the RAM gets /CAS enabled, no matter which 512KB bank it is - then the relevant bank to read gets /RAS. Now with PeteAU's design the /CAS lines for the relevant expansion bank gets enabled.

Ok, this is where it gets hairy. So, what about if you have 1MB of chip RAM on board, like I do? PeteAU's Gary board apparently takes care of that too.

One final comment before I move onto the Gary adapter board. Why not go and fiddle with /RAS to enable the right banks? Well, it has to do with DRAM refresh. Agnus also takes care of the refresh cycles which involves asserting only /RAS and enabling each row in the DRAM. If we go and gate which /RAS goes to which chip bank then a bunch of RAM won't get refreshed. You'd then have to add extra logic to see if it's a refresh cycle and manually refresh! Or, you can leave /RAS0 and /RAS1 alone, so DRAM refresh is still OK, and enable RAM using /CAS.

Right! Now onto the Gary board, where the magic happens.

A few things have to happen here for all of this magic to be, well, magic.

  • If you want to support 1MB chip RAM as well as 512K of slow RAM, you need to be able to decode both the chip RAM and slow RAM regions and enable those to Agnus via its A19 line. This is the jumper wire to JP2 on the A500 Rev6 PCB.
  • If you want to support more than 512K of slow RAM, you need to decode both A19 and A20 and fake the /REGEN and /RAMEN signals towards Agnus. (/REGEN is "chipset register space is being accessed.) That way Agnus still thinks a RAM cycle is being initiated, even if it can't "see" the larger RAM address space.
  • Then you need A19/A20 - or a modified version - to the trapdoor RAM expansion. This is the extra 2 bits, or 4 banks of 512KB, to decode all that extra RAM.
  • If Agnus is asserting /BLIT - which says it's doing a DMA cycle, then we need to ensure it's going to U1 on the RAM expansion, or the second 512KB chip RAM bank.

Now, this last point is important for me and other 1MB Amiga chip RAM amiga 500s. How can this RAM expansion even work if there is 1MB of chip RAM on-board? The on-board RAM has all of the /CAS lines from Agnus wired up, without going through the mux.

I think the answer is "not without modifying the board." /RAS0 will continue to enable the first 512K bank. The other 512KB bank and the slow RAM expansion will run off of /RAS1. What I'm likely going to have to do is:

  • Build a 74LS139 adapter board myself for the mainboard;
    • Run the /CAS lines from the expansion connector to it;
    • Run the Gary expansion adapter A19/A20 lines to it;
  • Lift the /CAS lines from the other 512KB RAM I installed, and route it to the 74LS139, like what PeteAU's RAM expansion board does;
  • Modify my A501 (that should be here soon!) to use the /CAS lines I feed it from the 74LS139 board, rather than the expansion connector /CAS lines;
  • Hope it all works!

I'll try to remember to report back with some photos!

 


,

David RoweMeasuring HF Noise Around a City

I recently took my loop antenna [1] to a national park on the edge of Adelaide, and tried to measure the noise power at 7.1MHz. I was expecting a low level. Curiously, the noise seemed directional, there was a 10dB null off the loop in one direction. This made me wonder if I was “DFing Adelaide”, and seeing the ground wave signal [5] from the sum of all noise sources in the city.

So I hatched a plan to take my loop antenna to a bunch of national parks around the perimeter of the city, and see if I could get sensible bearings back towards the city. This time I used a spectrum analyser so I didn’t have to mess around with s-meters. I spent a pleasant day circumnavigating the city, and hacked up a Python script [2][3] to plot the results on a map. The base of each line is the measurement location, and the height of the line the noise relative to the noise floor of the system.

Here’s a plot of noise power against the distance from the center of the city.

These results suggest noise levels drop as we get further away from the city, and is some evidence for ground wave propagation of noise from the city. I had wondered if being sited in a valley just outside the city would be enough to get a low noise floor, but perhaps distance is a factor as well.

I’m surprised a straight line fit of log(power) against linear(distance) works so well, as it should be a curve due to the inverse square law. Oh well, I guess it’s just a small amount of data.

The outlier at 30km was a site between two large sheds that were making machinery noises. I could see harmonics on the spec-an so suspect switch mode power supplies or similar nasties. Country sites can have local noise too.

The loop antenna/pre-amp/spec-an combination has a noise floor of -146 dBm/Hz. Some of my measurements were close to that level (e.g. -143). The measured noise power is actually the sum of the noise power collected by the antenna and the noise of the receiver. So if the received noise is the same level as the receiver (say both -146), the total noise power we measure is twice that (3dB higher) or -143. So I added some code to correct for the receiver noise power, this makes a few dB difference at low noise levels.

Reference [6] Figure 4.9 has some curves of expected noise powers, calculated from an ITU-T document on the subject. Here is a comparison with my measurements at 7 MHz, using my loop antenna and pre-amp system [1]:

Condition Expected Noise Factor (dB) [6] Measured Noise (dBm/Hz) Measured Noise Factor (dB)
Residential 50 -120 54
Quiet Country 30 -146 28

For the residential case I measured between -115dB/Hz and -127dBm/Hz around my suburb so used -120 as the average. Much to my surprise my results are a reasonable match with [6]! Of course rather than being fixed values, the absolute noise levels vary quite a lot over different sites, time of day, and atmospheric conditions, which is noted in [6] and the ITU-T document.

There is a 20dB difference in noise power from the best case in my suburb (-127 dBm/Hz) to the quietest location I visited (-146 dBm/Hz). This is consistent with [6]. That’s a factor of 100 if the noise power in expressed in Watts. Someone could use 1W to talk to me in the country, but require 100 watts to achieve the same SNR if I was listening in my suburb. It’s not possible for the average 100W Ham station to increase their power to 10kW which explains why HF reception from urban locations is getting so hard.

My experimental aim of “DF-ing” the noise was busted. I couldn’t find good nulls from most sites, especially the very low noise level sites.

Layers of Urban Noise

I have some vague hope of using DSP techniques and multiple antennas to cancel some of the noise and actually be able to receive 40M HF radio signals at home. These hopes are fading – it seems like there are several layers of noise:

  1. A city wide urban noise fog made up of the sum of millions of switch mode power supplies, VDSL [4], and high speed digital hardware. This is the best case S5-6 noise I see at certain times of day, and that I have established [1] is fairly uniform (+/- 5dB) around my suburb. An antenna with directivity could possibly help by rejecting noise from some directions, but I don’t really want to put up a 40M beam. I’m not sure if this noise can be reduced using two antenna phase cancellation techniques as the “urban fog” is probably the sum of energy collected from many directions, and phase cancellation effectively puts a null in the pattern of a two antenna array at one (or perhaps two) bearings. More thought required.
  2. High energy, very local stuff that sits on top that urban noise fog, like my neighbours lights or power line noise. This is often impulsive, and pushes the noise over S9. There may be several of these local sources coming from different directions, complicating the use of phase-difference subtraction techniques.
  3. Some times you get all of these noise sources at the same time, which explains why the analog phase-cancellation based boxes aren’t always effective. They can only cancel one noise source.

Further Work

Some ideas for further work:

  1. Try a wire antenna with directivity.
  2. Try 20m, that seems to have a lower quiescent noise floor (S0 at times), but still gets hammered by the S9+ local noise sources at various times of day. However that might reduce the problem to just the local noise sources, as the “urban noise fog” is at an acceptable level. I think. Time to retune the loop!
  3. Think about noise reduction techniques for the strong local impulsive noise.

Links

[1] Measuring Urban HF Noise with a Loop
[2] Easy Steps To Plot Geographic Data on a Map
[3] GitHub repo for this project
[4] VDSL versus HF Radio
[5] Radio Noise W8JI
[6] A High Performance Active Antenna for the High Frequency Band

,

Russell CokerSSD Endurance

I previously wrote about the issue of swap potentially breaking SSD [1]. My conclusion was that swap wouldn’t be a problem as no normally operating systems that I run had swap using any significant fraction of total disk writes. In that post the most writes I could see was 128GB written per day on a 120G Intel SSD (writing the entire device once a day).

My post about swap and SSD was based on the assumption that you could get many thousands of writes to the entire device which was incorrect. Here’s a background on the terminology from WD [2]. So in the case of the 120G Intel SSD I was doing over 1 DWPD (Drive Writes Per Day) which is in the middle of the range of SSD capability, Intel doesn’t specify the DWPD or TBW (Tera Bytes Written) for that device.

The most expensive and high end NVMe device sold by my local computer store is the Samsung 980 Pro which has a warranty of 150TBW for the 250G device and 600TBW for the 1TB device [3]. That means that the system which used to have an Intel SSD would have exceeded the warranty in 3 years if it had a 250G device.

My current workstation has been up for just over 7 days and has averaged 110GB written per day. It has some light VM use and the occasional kernel compile, a fairly typical developer workstation. It’s storage is 2*Crucial 1TB NVMe devices in a BTRFS RAID-1, the NVMe devices are the old series of Crucial ones and are rated for 200TBW which means that they can be expected to last for 5 years under the current load. This isn’t a real problem for me as the performance of those devices is lower than I hoped for so I will buy faster ones before they are 5yo anyway.

My home server (and my wife’s workstation) is averaging 325GB per day on the SSDs used for the RAID-1 BTRFS filesystem for root and for most data that is written much (including VMs). The SSDs are 500G Samsung 850 EVOs [4] which are rated at 150TBW which means just over a year of expected lifetime. The SSDs are much more than a year old, I think Samsung stopped selling them more than a year ago. Between the 2 SSDs SMART reports 18 uncorrectable errors and “btrfs device stats” reports 55 errors on one of them. I’m not about to immediately replace them, but it appears that they are well past their prime.

The server which runs my blog (among many other things) is averaging over 1TB written per day. It currently has a RAID-1 of hard drives for all storage but it’s previous incarnation (which probably had about the same amount of writes) had a RAID-1 of “enterprise” SSDs for the most written data. After a few years of running like that (and some time running with someone else’s load before it) the SSDs became extremely slow (sustained writes of 15MB/s) and started getting errors. So that’s a pair of SSDs that were burned out.

Conclusion

The amounts of data being written are steadily increasing. Recent machines with more RAM can decrease storage usage in some situations but that doesn’t compare to the increased use of checksummed and logged filesystems, VMs, databases for local storage, and other things that multiply writes. The amount of writes allowed under warranty isn’t increasing much and there are new technologies for larger SSD storage that decrease the DWPD rating of the underlying hardware.

For the systems I own it seems that they are all going to exceed the rated TBW for the SSDs before I have other reasons to replace them, and they aren’t particularly high usage systems. A mail server for a large number of users would hit it much earlier.

RAID of SSDs is a really good thing. Replacement of SSDs is something that should be planned for and a way of swapping SSDs to less important uses is also good (my parents have some SSDs that are too small for my current use but which work well for them). Another thing to consider is that if you have a server with spare drive bays you could put some extra SSDs in to spread the wear among a larger RAID-10 array. Instead of having a 2*SSD BTRFS RAID-1 for a server you could have 6*SSD to get a 3* longer lifetime than a regular RAID-1 before the SSDs wear out (BTRFS supports this sort of thing).

Based on these calculations and the small number of errors I’ve seen on my home server I’ll add a 480G SSD I have lying around to the array to spread the load and keep it running for a while longer.

,

Jan SchmidtPulling on a thread

I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.

I found something boneheaded doing that today.

There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.

In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.

To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:

Pose recognition latency.
dt = interpose spacing, delay = frame to pose latency

Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.

Neither of those should be happening.

Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.

Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.

OpenHMD Rift CV1 tracking timing

This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!

That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.

This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.

There are 3 threads involved:

  • USB service thread, reading video frame packets and assembling pixels in framebuffers
  • Fast analysis thread, that checks tracking lock is still acquired
  • Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs

These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:

while driver_running:
    Pop a frame from the queue
    Process the frame
    Sleep for new frame notification

The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!

The fix is simple:

while driver_running:
   Pop a frame
   Process the frame
   if queue_is_empty():
     sleep for new frame notification

Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.

Pose latency and inter-pose spacing after fix

This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.

What does this mean in practice?

The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.

Before: Left Controller pose delays by sensor
After: Left Controller pose delays by sensor

Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.

You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.

The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):

Before: Left Controller poses
After: Left Controller poses

,

Linux AustraliaCouncil Meeting 12th January 2022 – Minutes

1. Meeting overview and key information

Present

  • Sae Ra Germaine (President)
  • Joel Addison (Vice President)
  • Russell Stuart (Treasurer)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Neil Cox (Council)

Apologies

Meeting opened at 20:25 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Joel

2. Log of correspondence

3. Items for discussion

  • PyCon 2022
    • Currently there is a large amount of deposit money sitting with the Adelaide Convention Centre
    • We need to work out what to do (and what we can do) with the deposit if the decision is made that an in person event is not feasible
    • The team indicate that they would not run an online event again, but in person would be possible
    • We will keep discussing this with Richard to determine next steps

4. Items for noting

  • Linux.conf.au 2022
    • Attendee numbers are down at the moment, which means we might have some budget concerns
    • Final preparations are being made at the moment and things are coming together

5. Other business

  • Annual Report
    • Russell is working on this at the moment
    • Aim to send it to members on Friday morning (a day before the AGM)

6. In Camera

1 item was discussed

The post Council Meeting 12th January 2022 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 22nd December 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

–    Jonathan Woithe (Council)

Meeting opened at 19:35 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

3. Items for discussion

  • Rusty Award discussion underway.

ACTION ITEM: Clinton to put up notes about the previous rules for the Rusty Award on the website.

  • Sae Ra: someone needs to take the returning office through the accreditation officer.

ACTION ITEM: Clinton and Joel to go through the prepared notes for how to create an election, then run through it with the returning officer.

ACTION ITEM: For all council officers to send their biographies into the secretary.

ACTION ITEM: for all council members to register for the agm via zoom.

ACTION ITEM: for the secretary to remind linux-aus to register for the AGM.

The post Council Meeting 22nd December 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 08th December 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

Meeting opened at 19:32 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

  • Kathy Reid request for de-identified membership postcode/city/country data to create a map showing visual densities of member locations
    • Sae Ra notes that she would like to show the same map at the Linux Australia talk that she is doing with Julien at LCA2022. Sae Ra suggests that our members can legally request membership data under our constitution, although our privacy policy has wording that suggests data may not be shared with third parties unless this is required to carry out normal activities or we are compelled to by law enforcement. The committee commences reading the Associations Incorporations Act, constitution and privacy policy. Neill thinks the privacy policy blocks us from sending the information. Clinton wonders if the council can give Kathy the final result of the data she’s after without giving individual numbers. Joel mentions this would be normal operation if we generated this ourselves. Jonathan thinks the constitution and the privacy policy don’t clash, as there’s a provision for if we are legally obliged. Russell Coker thinks any member of good standing should not constitute a third party.  Clinton asks, again, if council can make the final product, a pretty heat map. It is mentioned that Kathy is on the Linux Australia Media team, so sharing data could come under that. Lots of back and forward discussion about how to make things clear between the constitution and privacy policy. ACTION ITEM: Jonathan to make an adjustment to privacy policy, removing ambiguity.
  • Sae Ra raises a motion to give the requested data to Kathy for the purpose of generating a heat map of members, Russell Coker seconds. A vote on the motion:

For: six.

Against: none.

Abstain: 1.

The motion is carried. ACTION ITEM: Sae Ra to get postcode information to Kathy.

3. Items for discussion

  • Bank kerfuffle. Russell rang the bank again today, they did not get back to him. Has also emailed them with no result. Russell guesses that it’s closed due to inactivity. If that is the case, it’s very hard to undo, and the easiest thing to do is to shift bank accounts. ACTION ITEM: Russell will update the account in Stripe and XERO.

4. Items for noting

  • Once the admin team has moved the instance of civicrm to our local instance, we’ll be ready to open up taking nominations for council. Sae Ra asks for the best time to hold the AGM during LCA, some non-important logistical discussion ensues. Saturday is chosen.
  • Jonathan has a work in progress task to look at possible voting platforms, at this point still thinks we should stick with Zoom, even though it’s not an ideal solution for a FOSS organisation. The 22nd of December meeting would mostly be looking at nominations. Looking at changing the Fifth of Jan meeting to the Twelfth to make things a little easier, an induction for any new council members.

5. Other business

  • Russell is still dealing with some outstanding invoices.

6. In Camera

  • 1 item was discussed in relation to LCA2022

The post Council Meeting 08th December 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 24th November 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

Meeting opened at 19:35 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

  • WA open source group. Sae Ra to continue correspondence to work out more information.
  • Kathy’s NZ request. Discussion about the request. ACTION ITEM Sae Ra to respond to Kathy with a thank you, but no thanks.

3. Items for discussion

  • Sae Ra has organised a dummy election, asks everyone to vote after 9pm
  • Steve talks on behalf of the Admin team. Talks about the election site. Cron is being done manually atm. Email still going to disk atm (to stop mail going to the world). Sae Ra has seen some small issues with icons, will put a list of little niggles together. Has moved a lot of the mail across, needs to wait till more stuff is in our control to continue. Waiting for the mirror admin to move things across to upload pyconau stuff.
  • LCA2022. Tickets should be opening up shortly. Some keynotes have been opened up. AV company has been selected.

4. Items for noting

  • ACTION ITEM For all council members to check and update the council position descriptions up on github. https://github.com/linuxaustralia/position-descriptions

5. Other business

  • Nil

6. In Camera

  • Nil

The post Council Meeting 24th November 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 10th November 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woithe (Council)
  • Russell Coker (Council)
  • Neil Cox (Council)

Apologies

–    Russell Stuart (Treasurer)

  • Sae Ra Germaine (President)

Meeting opened at 19:34 AEDT by Joel Addison  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

Snail mail: Bank term deposit slips.

3. Items for discussion

  • Defamation Law (Open Australia in particular?) We would like more information about their case. Jonathan has posted his one page summary to the list.  Clinton lays out a possible event chain that leads to someone writing a possibly defamatory note on Open Australia’s website about a housing developer. Joel comments that at least having an updated communications policy that makes things clear would be better. Cancelling our communications channels isn’t really feasible, neither is moderating them. Limiting them does make things better. Discussion about different channels and what moderation options already exist with them. General discussion around the judgement and what lawyers are saying about it. Neill suggests that we really don’t want to become the test case. Jonathan quotes from the communication policy, and it allows us to switch moderation on if we need it.
  • ACTION ITEM:Joel and Sae Ra:  itemize our communication channels, work out what can be moderated and what can’t be, what ones we can close.

Discussions about what forms of communications fall under the ruling. It’s about sites that are publically available that can take comments.

ACTION ITEM: Jonathan to modify the communication policy to make it clearer and easier for LA to protect itself; post to list when it is updated; council will vote at next normal meeting.

4. Items for noting

  • LCA AV provider, have received quotes from all providers, and have selected a preferred provider. Sae Ra and Joel are going to be meeting with the preferred supplier soon to discuss details.
  • LA Website – Testing occurring Sun 14 Nov
  • Linux Conf Au Session Selection Committee Post meeting Tomorrow, anyone available? Joel and Sae Ra are going to attend, to make up for Clinton’s having a little bit of a social life.

5. Other business

6. In Camera

  • 1 item was discussed

The post Council Meeting 10th November 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 27th October 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

–    Russell Stuart (Treasurer)

Apologies

Meeting opened at 19:30 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

Dave Sparks and Owen speaking for Drupal. First time for Dave at this meeting, taking over from Owen. Speakers are locked in, registrations trickling in, looking at 230/240 attendees. Looking at a $7k profit overall. Looking at in person events for next year. Did pull a Wellington venue at no cost for this year. Dave is in New Zealand. Dave will move into the chair role for the rest of his term, the end of 2022 currently. Owen staying on as events management role for his term 2023. Jonathan thanks them for the draft AGM report previously sent. Sae Ra asks if they want a thank you from the LA council at the conference.

Steve speaking on behalf of the admin team. Did a final sync of the website from the cloud  version. Needs to update some urls in the stand alone website so that it can send emails at a certain point in time.  Promises Joel that the cloudy stuff will be happening soon. Steve says there will be a few changes to the budget, only a couple of hundred dollars, not thousands. Migrating everything to the temporary mirror. From 4TB to 45TB storage.

Looking at rebuilding the gitlab instance, one for LA projects, one for others. This means project managers cannot see secrets in another project.

Patrick speaking for Joomla Australia. Close to a constitution, code of conduct etc documents. Another meeting on Tuesday, then those documents should be ready to submit to LA to become a subcommittee. Feeling the effects of Joomla 4 that was released last month. Some fine tuning with third party developers was required.

Richard, speaking for Pycon Australia: have some draft reports written up on the last event. Budget isn’t closed yet, mostly due to gifts not being sent out yet. Richard will delegate AGM report writing. Still have a considerable part of a deposit for a physical venue for next year. Can scale the booking depending on ticket sales. Trying to get another person or two onto the steering committee.

3. Items for discussion

  • Rusty wrench
  • Jonathan has drafted up a communications policy.
  • ACTION ITEM: Sae Ra to send out Rusty Wrench nominations.

4. Items for noting

  • Nil

5. Other business

  • Nil

6. In camera

2 items were discussed in camera

The post Council Meeting 27th October 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 13th October 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

–    Russell Coker (Council)

Apologies

Meeting opened at 19:37 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

3. Items for discussion

  • Portfolios. Sae Ra shows her Portfolios diagram. Minor feedback is given to the diagram. Sae Ra to make some minor adjustments.
  • LCA stuff to still deal with:
    • Comments on social media, defamation law judgement. TODO: Jonathan to come up with a one sheet giving an overview of the situation.
  • SSC virtual face to face meeting. Sae Ra says it turned out to be easier than in many previous years, the program sort of fell together. Was run really well on the day. The jitsi platform wasn’t working terribly well .

4. Items for noting

  • Nil

5. Other business

  • Nil

6. In Camera

4 items were discussed

The post Council Meeting 13th October 2021 – Minutes appeared first on Linux Australia.

,

Chris SmartSwift object storage clusters with Ansible

OpenStack Swift is a most brilliant object store and I have built a few clusters over the last couple of years using SwiftStack’s product, which is really excellent. However, I need an alternative way of creating and managing clusters, that’s still inspired by SwiftStack.

Ansible seems a logical choice to me, as I need to manage packages, configure services and perform certain operations on a running cluster.

A couple of months ago I attended the Swift Ops feedback session at OpenStack PTG and said that I had been working on this. Several other attendees mentioned they are also looking at Ansible, or currently use it for bits and pieces, so hopefully this can grow into something useful!

My Ansible Swift role can currently build a Swift cluster from scratch (on CentOS 8 Stream nodes) and perform some basic operations. The idea is to define the nodes and their services (PACO) using YAML based Ansible inventory and use the role to perform specific actions. Together all of those actions let you build an entirely new cluster, or modify and manage an existing one. Over time, I plan to add new features and operational actions.

The role currently performs the following:

  • Prepare Swift nodes, including setting SELinux and ensuring SSH access
  • Add repositories and install packages (upstream RDO at the moment)
  • Configure dependent services, such as logging, rsyncd and memcached
  • Configure and enable TLS proxy endpoint for Proxy nodes
  • Configure keepalived for failover of Proxy VIPs
  • Configure Swift PACO services using templates and variables
  • Create initial account, container and object rings
  • Prepare disks on each node, format and mount to match the rings
  • Build and distribute the rings
  • Configure dispersion
  • Simple operational tasks such as:
    • Add new disks
    • Rebuild and distribute the rings
    • Reconfigure PACO services
    • Generate dispersion and replication reports

I’m working on a library for managing the rings, as it’s a bit clunky using the swift-ring-builder command in the role. So that’ll be the next focus, then onto other operations functions as well as moving from RPM packages to containers. Eventually I’d like to have a web service which can front-end the inventory and Ansible plays, providing a unified management platform.

I also provide sample inventory, playbooks and wrapper scripts on GitHub. Simply modify the inventory to suit your node and network setup and run the site.sh role wrapper script. It also optionally supports my virtual infrastructure role which means one can easily spin up a virtual, production-like multi-node Swift cluster from scratch on KVM hosts.

Running the Swift Ansible playbook

I would love any feedback and to collaborate on this with anyone, if it seems useful.

,

Russell CokerVideo Conferencing (LCA)

I’ve just done a tech check for my LCA lecture. I had initially planned to do what I had done before and use my phone for recording audio and video and my PC for other stuff. The problem is that I wanted to get an external microphone going and plugging in a USB microphone turned off the speaker in the phone (it seemed to direct audio to a non-existent USB audio output). I tried using bluetooth headphones with the USB microphone and that didn’t work. Eventually a viable option seemed to be using USB headphones on my PC with the phone for camera and microphone. Then it turned out that my phone (Huawei Mate 10 Pro) didn’t support resolutions higher than VGA with Chrome (it didn’t have the “advanced” settings menu to select resolution), this is probably an issue of Android build features. So the best option is to use a webcam on the PC, I was recommended a Logitech C922 but OfficeWorks only has a Logitech C920 which is apparently OK.

The free connection test from freeconference.com [1] is good for testing out how your browser works for videoconferencing. It tests each feature separately and is easy to run.

After buying the C920 webcam I found that it sometimes worked and sometimes caused a kernel panic like the following (partial panic log included for the benefit of people Googling this Logitech C920 problem):

[95457.805417] BUG: kernel NULL pointer dereference, address: 0000000000000000
[95457.805424] #PF: supervisor read access in kernel mode
[95457.805426] #PF: error_code(0x0000) - not-present page
[95457.805429] PGD 0 P4D 0 
[95457.805431] Oops: 0000 [#1] SMP PTI
[95457.805435] CPU: 2 PID: 75486 Comm: v4l2src0:src Not tainted 5.15.0-2-amd64 #1  Debian 5.15.5-2
[95457.805438] Hardware name: HP ProLiant ML110 Gen9/ProLiant ML110 Gen9, BIOS P99 02/17/2017
[95457.805440] RIP: 0010:usb_ifnum_to_if+0x3a/0x50 [usbcore]
...
[95457.805481] Call Trace:
[95457.805484]  
[95457.805485]  usb_hcd_alloc_bandwidth+0x23d/0x360 [usbcore]
[95457.805507]  usb_set_interface+0x127/0x350 [usbcore]
[95457.805525]  uvc_video_start_transfer+0x19c/0x4f0 [uvcvideo]
[95457.805532]  uvc_video_start_streaming+0x7b/0xd0 [uvcvideo]
[95457.805538]  uvc_start_streaming+0x2d/0xf0 [uvcvideo]
[95457.805543]  vb2_start_streaming+0x63/0x100 [videobuf2_common]
[95457.805550]  vb2_core_streamon+0x54/0xb0 [videobuf2_common]
[95457.805555]  uvc_queue_streamon+0x2a/0x40 [uvcvideo]
[95457.805560]  uvc_ioctl_streamon+0x3a/0x60 [uvcvideo]
[95457.805566]  __video_do_ioctl+0x39b/0x3d0 [videodev]

It turns out that Ubuntu Launchpad bug #1827452 has great information on this problem [2]. Apparently if the device decides it doesn’t have enough power then it will reconnect and get a different USB bus device number and this often happens when the kernel is initialising it. There’s a race condition in the kernel code in which the code to initialise the device won’t realise that the device has been detached and will dereference a NULL pointer and then mess up other things in USB device management. The end result for me is that all USB devices become unusable in this situation, commands like “lsusb” hang, and a regular shutdown/reboot hangs because it can’t kill the user session because something is blocked on USB.

One of the comments on the Launchpad bug is that a powered USB hub can alleviate the problem while a USB extension cable (which I had been using) can exacerbate it. Officeworks currently advertises only one powered USB hub, it’s described as “USB 3” but also “maximum speed 480 Mbps” (USB 2 speed). So basically they are selling a USB 2 hub for 4* the price that USB 2 hubs used to sell for.

When debugging this I used the “cheese” webcam utility program and ran it in a KVM virtual machine. The KVM parameters “-device qemu-xhci -usb -device usb-host,hostbus=1,hostaddr=2” (where 1 and 2 are replaced by the Bus and Device numbers from “lsusb”) allow the USB device to be passed through to the VM. Doing this meant that I didn’t have to reboot my PC every time a webcam test failed.

For audio I’m using the Sades Wand gaming headset I wrote about previously [3].

Francois MarierRemoving an alias/domain from a Let's Encrypt certificate managed by certbot

I recently got an error during a certbot renewal:

Challenge failed for domain echo.fmarier.org
Failed to renew certificate jabber-gw.fmarier.org with error: Some challenges have failed.
The following renewals failed:
  /etc/letsencrypt/live/jabber-gw.fmarier.org/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)

due to the fact that I had removed the DNS entry for echo.fmarier.org.

I tried to find a way to remove that name from the certificate before renewing it, but it seems like the only way to do it is to create a new certificate without that alternative name.

First, I looked for the domains included in the certificate:

$ certbot certificates
...
  Certificate Name: jabber-gw.fmarier.org
    Serial Number: 31485424904a33fb2ab43ab174b4b146512
    Key Type: RSA
    Domains: jabber-gw.fmarier.org echo.fmarier.org fmarier.org
    Expiry Date: 2022-01-04 05:28:57+00:00 (VALID: 29 days)
    Certificate Path: /etc/letsencrypt/live/jabber-gw.fmarier.org/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/jabber-gw.fmarier.org/privkey.pem

Then, deleted the existing certificate:

$ certbot delete jabber-gw.fmarier.org

and finally created a new certificate with all other names except for the obsolete one:

$ certbot certonly -d jabber-gw.fmarier.org -d fmarier.org --duplicate

,

Simon LyallAudiobooks – December 2021

How Smart Machines Think by Sean Gerrish

An introduction to Machine learning, covering advances of the last 10 years or so via stories about self-driving cars, the Netflix prize etc. 3/5

Harrier 809: Britain’s Legendary Jump Jet and the Untold Story of the Falklands War by Rowland White

Covering the Sea Harrier’s part in the Falkland’s as well as other parts of the air war like operations in Chile and Argentinian units. Well research and written. 4/5

Caroline: Little House, Revisited by Sarah Miller

A retelling of Little House on the Prairie from the perspective of Caroline Ingalls. Interesting re-reading events though an adult’s eyes. Second external review. 3/5

My Adventurous Life by Dick Smith

Autobiography by Australian Entrepreneur and Adventurer. Well packed with interesting stories of both business and other endeavors. 3/5

Nuclear Folly: A History of the Cuban Missile Crisis by Serhii Plokhy

Draws on Soviet and Ukrainian sources to give more details from the Russian side than previous books. Emphasizes the role of luck as both sides misread the other. 3/5

999 – My Life on the Frontline of the Ambulance Service by Dan Farnworth

Stories from the author’s career plus their personal struggles and advocacy for better mental-health support for Ambulance officers. 3/5

The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race by Walter Isaacson

As well as profiling Doudna it covers others in the field as well as the technology. Some early responses to the Covid19 pandemic. 3/5

Thunderball by Ian Fleming

James Bond travels to a Health Camp(!) and then to the Bahamas to investigate stolen Nuclear Weapons and Blackmail. Usual action ensues 3/5

See also: Top Audiobooks I’ve listened to

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Russell CokerTerrorists Inspired by Fiction

The Tom Clancy book Debt of Honor published in August 1994 first introduced the concept of a heavy passenger aircraft being used as a weapon by terrorists against a well defended building. In April 1994 there was an attempt to hijack and deliberately crash FedEx flight 705. It’s possible for a book to be changed 4 months before publication, but it seems unlikely that a significant plot point in a series of books was changed in such a small amount of time so it’s likely that Tom Clancy got the idea first. There have been other variations on that theme, such as the Yokosuka_MXY-7 Kamakazi flying bomb (known by the Allies as “Baka” which is Japanese for idiot). But Tom Clancy seemed to pioneer the idea of a commercial passenger jet being subverted for the purpose of ground attack.

7 years after Tom Clancy’s book was published the 911 hijackings happened.

The TV series Black Mirror first aired in 2011, and the first episode was about terrorists kidnapping a princess and demanding that the UK PM perform an indecent act with a pig for her release. While the plot was a little extreme (the entire series is extreme) the basic concept of sexual extortion based on terrorist acts is something that could be done in real life, and if terrorists were inspired by this they are taking longer than expected to do it.

Most democracies seem to end up with two major parties that are closely matched. Even if a government was strict about not negotiating with terrorists it seems likely that terrorists demanding that a politician perform an unusual sex act on TV would change things, supporters would be divided into groups that support and oppose negotiating. Discussions wouldn’t be as civil as when the negotiation involves money or freeing prisoners. If an election result was perceived to have been influenced by such terrorism then supporters of the side that lost would claim it to be unfair and reject the result. If the goal of terrorists was to cause chaos then that would be one way of achieving it, and they have had over 10 years to consider this possibility.

Are we overdue for a terror attack inspired by Black Mirror?

Russell CokerBig Smart TVs

Recently a relative who owned a 50″ Plasma TV asked me for advice on getting a new TV. Looking at the options all the TVs seem to be smart TVs (running Android with built in support for YouTube and Netflix) and most of them seem to be 4K resolution. 4K doesn’t provide much benefit now as most people don’t have BlueRay DVD players and discs, there aren’t a lot of 4K YouTube videos, and most streaming services don’t offer 4K resolution. But as 4K doesn’t cost much more it doesn’t make sense not to get it.

I gave my relative a list of good options from Kogan (the Australian company that has the cheapest consumer electronics) and they chose a 65″ 4K Smart TV from Kogan. That only cost $709 plus delivery which is reasonably affordable for something that will presumably last for a long time and be used by many people.

Netflix on a web browser won’t do more than FullHD resolution unless you use Edge on Windows 10. But Netflix on the smart tv has a row advertising 4K shows which indicates that 4K is supported. There are some 4K videos on YouTube but not a lot at this time.

Size

It turns out that 65″ is very big. It didn’t fit on the table that had been used for the 50″ Plasma TV.

Rtings.com has a good article about TV size vs distance [1]. According to their calculations if you want to sit 2 meters away from a TV and have a 30 degree field of view (recommended for “mixed” use) then a 45″ TV is ideal.

According to their calculations on pixel sizes, if you have a FullHD display (or the common modern case a FullHD signal displayed on a 4K monitor) that is between 1.8 and 2.5 meters away from you then a 45″ TV is the largest that will be useful. To take proper advantage of a monitor larger than 45″ at a distance of 2 meters you need a 4K signal. If you have a 4K signal then you can get best results by having a 45″ monitor less than 1.8 meters away from you. As most TV watching involves less than 3 people it shouldn’t be inconvenient to be less than 1.8 meters away from the TV.

The 65″ TV weighs 21Kg according to the specs, that isn’t a huge amount for something small, but for something a large and inconvenient as a 65″ TV it’s impossible for one person to safely move. Kogan sells 43″ TVs that weigh 6KG, that’s something that most adults could move with one hand. I think that a medium size TV that can be easily moved to a convenient location would probably give an equivalent viewing result to an extremely large TV that can’t be moved at all. I currently have a 40″ LCD TV, the only reason I have that is because a friend didn’t need it, the previous 32″ TV that I used was adequate for my needs. Most of my TV viewing is on a 28″ monitor, which I find adequate for 2 or 3 people. So I generally wouldn’t recommend a 65″ TV for anyone.

Android for TVs

Android wasn’t designed for TVs and doesn’t work that well on them. Having buttons on the remote for Netflix and YouTube is handy, but it would be nice if there were programmable buttons for other commonly used apps or a way to switch between the last few apps (like ALT-TAB on a PC).

One good feature of Android for TV is that it can display a set of rows of shows (similar to the Netflix method of displaying) where each row is from a different app. The apps I’ve installed on that TV which support the row view are Netflix, YouTube, YouTube Music, ABC iView (that’s Australian ABC), 7plus, 9now, and SBS on Demand. That’s nice, now we just need channel 10’s app to support that to have coverage for all Australian free TV stations in the Android TV interface.

Conclusion

It’s a nice TV and it generally works well. Android is OK for TV use but far from great. It is running Android version 9, maybe a newer version of Android works better on TVs.

It’s too large for reasonable people to use in a home. I’ve seen smaller TVs used for 20 people in an office in a video conference. It’s cheap enough that most people can afford it, but it’s easier and more convenient to have something smaller and lighter.

Russell CokerCuriosity Stream

I have recently signed up for the Curiosity Stream [1] documentary site, this is designed to be like Netflix but for non-fiction content only. The service costs $US15 per annum or $52US per annum for 4K (I think the 4K service was about $US120 per annum when I signed up). The extra price for 4K seems excessive, while it is in line with the bandwidth requirements a large portion of the costs of the service would be about user support and running the service reliably for which 4K makes little difference.

My aim in subscribing was to just get a service like Netflix with new documentary content as I have watched every documentary I want to watch on Netflix (I think I’ve watched over 1000 hours of Netflix documentaries). So naturally I compare the service to Netflix and I found that it doesn’t compare well. Curiosity Stream (CS) has no button to skip the intro and has a problem with using the right arrow to skip forward (seems to only work once and then I have to use the mouse), this costs me about 30 seconds for each episode which adds up when watching documentaries at 1.5* speed. The method of controlling the viewing is a little clunky, sometimes the popup menu at the bottom of the screen to control playback doesn’t disappear by itself until you mouse over it and space bar doesn’t select pause instead it selects the last action. CS allows selecting individual episodes for your watch list instead of entire series, this could be useful for some people but I just find it annoying, it might be good for classroom use. The method of searching for new shows to watch isn’t as good as the Netflix method or the way things are displayed on Android TV (which seems to have an API for multiple video providers to show a list of shows with one row per provider). Some of these things might seem OK if you haven’t used other services, but if you are used to Netflix and Amazon Prime then it will seem clunky.

The amount of content on Curiosity Stream doesn’t seem that large, I don’t know how to get a full measure of it, but when I search for things I seem to get less results than on Netflix. That could be something to do with what I’m searching for.

In terms of value for money $US15 per annum for the content that CS provides is a good deal. Netflix overall offers better value for home users having fiction as well as non-fiction in large quantities. But for documentary content $US15 per annum is pretty cheap. I recommend signing up for CS, but for most people signing up for Netflix first will be a good option.

  1. [1] https://curiositystream.com/

,

Paul WiseFLOSS Activities December 2021

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

  • Spam: reported 166 Debian mailing list posts
  • Patches: reviewed libpst upstream patches
  • Debian packages: sponsored nsis, memtest86+
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:

Administration

  • libpst: setup GitHub presence, migrate from hg to git, requested details from bug reporters
  • plac: cleaned up git repo anomalies
  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages: stardict, node-carto
  • Debian wiki: unblock IP addresses, approve accounts

Communication

  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

The purple-discord, python-plac, sptag, smart-open, libpst, memtest86+, oci-python-sdk work was sponsored. All other work was done on a volunteer basis.

,

Russell CokerLinks December 2021

Wired magazine has many short documentary films on YouTube, this one about How Photography is Affecting Our Brains is particularly good [1].

Matt Blaze wrote an informative blog post about Faraday cages for phones [2]. It seems that the commercial shielded bags are all pretty good while doing it yourself with aluminium foil may get similar results or may get much worse results with no obvious difference in the quality of the wrapping. Aluminium foil doesn’t protect that well and doesn’t protect consistently. A metal biscuit tin performed quite well and consistently, so that’s a cheap option for reducing signals.

Umair Haque wrote an insightful article about the single word that describes most of the problems the world faces right now [3].

Forbes has an informative article about the early days of the Ford company when they doubled wages, it proves that they didn’t do so to enable workders to afford cars but to avoid staff turnover (which is expensive) [4]. Also the Ford company had a fascistic approach to employees, controlling what they were allowed to do in their spare time if they wanted the bonus payment. The wages weren’t doubled, there was a bonus payment that would double the salary if the employee was eligible for the bonus. One thing that Forbes gets wrong is that they claim that it was only having higher pay than other companies that provided a benefit and that a higher minimum wage wouldn’t, the problem with that idea is that a higher minimum wage would discourage people from having multiple jobs and allow more families to not have the mother working (a condition for a man to get the Ford bonus was for his wife to not work).

The WSJ has an interesting article about Intel’s datacenter for running all the different configurations of CPUs that they have supported over the last 10 years for security tests [5]. My Thinkpad (which is less than 10yo) is vulnerable to one of the SPECTRE family of exploits as Intel hasn’t released microcode to fix it, getting fixed microcode out for all the systems from major vendors like Lenovo would be a good idea if they want to improve their security.

NPR has an interesting article about the correlation between support for Trump in counties of the US with lack of vaccination and Covid19 deaths [6]. No surprises, but it’s good to see the graphs.

Cory Doctorow wrote an interesting article on the lack of “slack” in the current American education system [7]. It’s not that bad in Australia but we are unfortunately moving in the American direction.

Teen Vogue has an insightful article about the problems with the focus on resilience [8], while resilience is good we should make it a higher priority to avoid putting people in situations where they need to be resiliant than on encouraging resilience.

,

David RoweMeasuring Urban HF Noise with a Loop

For the last few years I’ve been interested in the problem of urban HF noise. As a first step I’d like to characterise the problem by taking a few measurements of noise at my home, around my suburb, and a few other locations. So I need a small, portable antenna. As I’m interested in high noise levels on HF, it doesn’t have to be very efficient.

On a “quiet” day – I see 7MHz noise from my dipole at S5 on my IC7200. I think the S meter is adding up all the noise in the IF bandwidth of the radio. Hooking up my spec-an and referring this to 1 Hz bandwidth, I measure the average noise at about -120 dBm/Hz which is 54dB above thermal (-174dBm/Hz). So even a low gain antenna will receive plenty of noise without hitting the noise floor of the receiver. Ideally I’d like to see -135 dBm/Hz, which is S0 on my IC7200.

Small Loop Antenna

I decided to build a loop, as they don’t need a ground plane, and are relatively insensitive to low permeability surrounding objects (like me holding it) [2].

Using the equations from [2] I decided to use 3 turns of RG58 (about 3m) with a 0.3m diameter. The braid is the conductor, I don’t use the inner at all.

My loop is fed using a toroidal transformer [5]. I like the idea of using a transformer to ensure good balance. All three turns of the loop coax pass through the toroid (the primary), and the secondary is a bunch of turns I wind myself. Using the equations on [2], I estimated the radiation resistance Rr at just 0.006 ohms. To match this to 50 ohms means a lot of turns on the matching transformer.

I tried 50 and 100 turns but for some reason the return loss was still really poor. Also it didn’t seem to be receiving anything, and touching antenna or capacitor terminals had no effect. On a whim I reduced the number of turns and the return loss started to improve, I could see a noticeable dip of a few dB as I moved the tuning capacitor. If I touched the capacitor or the loop it would de-tune, indicating it really was starting to become an antenna!

Turns out that for small loops, the resistive loss Rl is often greater than Rr. I confirmed this by connecting the loop in series with a fixed 100pF capacitor across the spec-an input. When driven by the tracking generator I can see a dip at resonance. By measuring the bandwidth I could estimate the Q and hence the loss resistance Rl of about 2.8 ohms, much greater than Rr of 0.006 ohms.

At about 15 turns I obtained a return loss of better than 10dB – good enough. However it’s really just a match to Rl, consequently the antenna is quite inefficient, I estimate a gain of 10*log10(D*Rr/Rl) = -25dB (D=1.5, the directivity of a small loop). But that still not bad for a 30cm loop on 40m, and good enough for my mission.

The ratio of 15 turns secondary, 3 turns primary (15/3)^2 would transform 2.8 ohm into around 70 ohms, and lead to an expected return loss of 15dB, about what I’m seeing.

EMI receiver

I decided to use my FT817 as a portable receiver to log noise values as I wander about with the loop. So I added a 26dB pre-amp [3] to compensate for the negative gain of the loop and to boost the Rx level high enough to move the S-meter. The pre-amp is increasing the signal and noise together, so unlike a regular pre-amp the receiver will not be more sensitive. Just louder.

I placed the loop on a 1m tripod at the front of my yard, away from the house as that seemed to reduce the noise a bit. Using distant SSB signals, I compared the dipole to the loop over a period of a few days. I used a RTL-SDR SDR running gqrx on the 40m band, with a 7MHz bandpass filter to prevent overload. At times the levels were quite close, at other times the loop down about 10dB. Sometimes the SNR was better on the loop, other times the dipole. It was hard to get exact measurements. I guess there are differences in pattern, polarisation, and the signals received at the slightly different sites, and the power of SSB bounces around.

However it’s fair to say the loop (with it’s pre-amp) is roughly as sensitive as the dipole. If the loop detects a signal (such as noise) the dipole will also detect the same signal at roughly the same level. So the loop is a useful antenna for portable EMI measurement.

By connecting my spec-an in parallel with the loop antenna, I created a calibration curve of noise density against the S meter reading:

My FT817 S-meter seems to have a very narrow range, and is wildly different from the IC7200 S meter. However if I can measure less than S1 on the FT817 (blue, attenuator off), it looks like I will hit my target of -135dBm/Hz.

Blowing up my FT817

During some portable measurements I blew up my FT817! The fix and fault (shorted filter EMI choke and blown internal battery fuse) were exactly as described in [4]. I managed to repair the Rx but the Tx power is still low at 200mW, it’s almost like it’s folding back due to high SWR (although the SWR into the dummy load is Ok). So I’ll need to think about that fault some more. I’ve added external fuses to both the positive and negative FT817 supply leads now, as well as the Lithium battery I use for external power.

Noise measurements

I hopped on my bike and logged the noise at various sites around my suburb. They all bent the FT817 needle with the attenuator off (blue), so I used the “attenuator on” curve (red). The actual measurements are the red circles in the figure above.

So it’s not just my house, this seems to be pervasive problem in my neighbourhood. The best case was walking out into the middle of a large sports ground. I guess that makes sense, as the EMI source must be around the perimeter, and we get some inverse square law fall off. Take that far enough and we have portable operation from a country site – putting some distance between us and the cloud of urban noise.

Discussion and Conclusions

I’ve been reading about antennas this year, and can recommend “Antenna Physics” by Robert Zavrel [1]. The technical level was just right for me – between typical Ham texts and deeply mathematical tomes. This project complemented my reading, and give me some practical experience in the concepts of loss resistance (Rl) and radiation resistance (Rr).

It was pretty cool to see the return loss improve as I adjusted the matching network, and understand “why”. I really enjoy the learning experiences with Ham Radio.

I now have a portable, sorta-calibrated EMI receiver, that I can use to measure the noise levels at various sites, and compare it to my target. As a bonus the loop is directional, so I can DF noise. The range of the FT817 S-meter is a bit limited, but it’s calibrated well enough to tell me if the noise level is similar to my house or getting near my target level of -135 dBm/Hz. Nowhere in my suburb is close, and the noise is not localised to my house.

I’m quite surprised a 30cm loop can receive 7 MHz signals at all, in a quiet location it works just fine as a receive antenna.

The occasional improvement in SNR with the loop is worth looking into. Maybe it’s the location away from the house. Or the lack of near B-field noise, compared to the near E-field noise that the dipole will respond too. It’s difficult to put my dipole in that location (or move it at all), as it’s so big and there are power lines nearby. The compact size of the loop is very useful for repositioning.

Time is a factor, my noise level varies over the day as various noise sources come and go. Over the course of the day I can see at least three different noise sources, for example (i) high energy clicks 20ms apart that seem to be coming from power lines (ii) some sort of (lighting?) noise from a neighbour in the evening, plus (iii) some other noise with a 30us period. Urban HF noise is a complex problem.

Reading Further

[1] Antenna Physics – an Introduction
[2] Antenna Theory Loop Antenna Page
[3] Experimental methods in RF Design, Ch2, Fig 2.34, Class A amplifier
[4] FT817 short circuit power supply repair – the exact fault I induced in my FT-817. This post was very helpful in fixing it!
[5] VK6CS Magnetic Loop Antenna Matching

Chris SmartSetting up an Ansible dev environment

I do a lot of dev work using my Ansible virtual infrastructure role to spin up various KVM guests across multiple hosts using YAML based inventories. I’ve just put my hand up to try and help maintain the Ansible libvirt community collection, which I use in my role.

This is my attempt to start documenting how I configure my Ansible dev environment on Fedora (but much should work for other distros). These instructions are taken from a number of places, including the project’s quick start guide as well as the official installation and developing modules guides.

There are a number of versions of Ansible which are supported on a number of Python releases (including 2 and 3). Ideally we want to be able to use any combination. To do this, we can use Git to get the Ansible code and manage the release we are working on or testing against, together with virtual environments for the Python version we need.

The process is something like this:

  • Fork Ansible code in GitHub and clone
  • Checkout the version of Ansible we want to use
  • Install the version of Python we want to use (if not the host’s)
  • Create and activate a Python virtual environment
  • Activate the Ansible development environment, using the code and Python from above
  • Create a Git branch and work on the code!

Each time we change the version of Ansible or Python, we can simple switch and re-activate.

Fork Ansible repos

As we use Git, the first step is to get our own copy of the code.

Browse to the Ansible project and fork into your own namespace. I am working on community.libvirt so I have also forked this into my own namespace.

Fork a Git repo into your own namespace

We will then use the resulting forked URLs to clone the repositories in the next step, for example my Ansible clone URL over SSH is git@github.com:csmart/ansible.git (copy yours from your forked repo in GitHub).

Clone repos

OK, now we are ready to clone the repos down from your own fork and add a remote pointing to upstream to fetch updates.

Get the Ansible code

Ansible will go into the ~/ansible directory (you can change that if you prefer), which will checkout the devel branch by default. We’ll call the upstream remote, upstream and fetch updates from both remotes. Note that the URL to clone from will be different than mine (copy yours from your forked repo in GitHub), but the upstream URL will be the same.

git clone git@github.com:csmart/ansible.git ~/ansible
cd ~/ansible
git remote add upstream https://github.com/ansible/ansible.git
git remote update

Get the Ansible collection code

Next, we clone to collection repository. Note that this goes into a specific directory structure to match the default locations that Ansible will search. Again, your clone URL will be different to mine, but the upstream URL will be the same.

mkdir -p ~/ansible_collections/community/libvirt
git clone git@github.com:csmart/community.libvirt.git ~/ansible_collections/community/libvirt/
cd ~/ansible_collections/community/libvirt/
git remote add upstream https://github.com/ansible-collections/community.libvirt.git
git remote update

Some versions of Ansible will not automatically look in this location for collections, so we can also add it to our path.

export ANSIBLE_COLLECTIONS_PATH="${HOME}/.ansible/collections:/usr/share/ansible/collections:${HOME}/ansible_collections/"

Add that line to your ~/.bashrc file if you want it to be automatic.

Switching Ansible versions

Once you have the Ansible git repository, you will be in the devel branch by default. To test your module with a different major release, simply check out the corresponding branch. You can see all latest release branches like so.

cd ~/ansible
git fetch upstream
git --no-pager branch -r --list "upstream/stable*" |sort -V

For example, to test against Ansible 2.12, checkout the stable-2.12 branch.

cd ~/ansible
git checkout stable-2.12

You can also get a specific release, for example, v2.12.1, by checking out the corresponding tag into a new local branch (hence the -b option).

cd ~/ansible
git --no-pager tag --list "v*" |sort -V
git checkout -b v2.12.1

Next we will get a specific version of Python to use with Ansible.

Switching Python versions

By default, Ansible will use the version installed on the host. Fortunately, Fedora provides versions of Python which are supported upstream, so we can easily switch. This follows the pattern python${major}.${minor} for example, python2.7 and python3.9, you simple install the versions you want.

If you’re not on Fedora, your distro might do something similar, or you may need to use a tool like Anaconda.

sudo dnf install python2.7 python3.6 python3.9 python3.11

Now, we simply use that Python version exactly as the command (such as python3.9) when creating our virtual environments.

Create Python virtual environment

Now that we have the Ansible code and a number of Python interpreters, let’s prepare the Python virtual environments which live inside the clone of the Ansible Git repository.

I’m using freshly installed Python 3.9 from the Fedora host in my example and note that I’m specifying the version as a subdirectory in the virtual environment directory so that we can easily switch. Although venv is a directory inside the code, it will be ignored by Git.

cd ~/ansible
python3.9 -m venv venv/3.9

For Python 2 we need to install virtualenv and create it differently.

sudo dnf install /usr/bin/virtualenv
cd ~/ansible
virtualenv --python /usr/bin/python2.7 env/2.7

Now we can activate our Python environment for the version we want (e.g. 3.9), update pip and install Ansible requirements.

We also need to do this each time we switch versions of Python or start a new shell session.

export PYTHON_VERSION=3.9
source ~/ansible/venv/${PYTHON_VERSION}/bin/activate
python -m pip install -U pip setuptools
python -m pip install -r ~/ansible/requirements.txt

Deactivating Python virtual environment

To deactivate a Python environment, simple run the deactivate command which will return you to the default system Python.

Activate Ansible hacking environment

Now that we have the code and a version of Python we want to use, we can run the a script from Ansible which will configure your current session to run from the source version in the git repo, rather than needing a full system install.

You should do this every time you switch between versions of Ansible or Python, or starting a new shell session.

source ~/ansible/hacking/env-setup
python -m pip install -U pip setuptools
python -m pip install -r ~/ansible/requirements.txt

Working on Ansible code

Remember that we have a remote pointing upstream to get the latest changes as well as our own fork in GitHub which is where we will push any changes.

To work on the code, we need to create a branch from the relevant upstream branch. We can see all of the latest available upstream branches with the following.

cd ~/ansible
git fetch upstream
git --no-pager branch --remote --list "upstream/*" |sort -V

For example, if we’re working on a feature we probably branch from devel. We can do this with a single checkout command which will create the new branch.

git checkout -b my-branch upstream/devel

Let’s push this new branch to our fork straight away and enable tracking so that don’t need to keep specifying where to push to.

git push --set-upstream origin my-branch

When you’re ready, submit a pull request via GitHub to have your code merged into upstream Ansible.

Get latest Ansible code updates

To pull in the latest code updates into your feature branch, we use the Git rebase function. This way, any local changes get moved on top of the latest commits (rather than a merge) as though they had always been added there. For example, if you are working on my-branch which was branched from devel, we need to fetch and rebase in the latest changes from upstream/devel.

git checkout my-branch
git pull --rebase upstream devel

If you have any local commits, you will need to force push your branch to your GitHub fork as the Git history has been rewritten with the rebase and your commit hashes will have changed. This will automatically update any merge requests.

Note that the branch is already tracking to our fork, so we don’t need to specify where to push to.

git push --force

Well, I think that’s a start at least…

,

Simon LyallDonations 2021

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

In 2021 I cut down donations to random open source projects but donated $100 each to The Software Freedom Conservancy and Software in the Public Interest.

I do a blog post about it to hopefully inspire others. See: 2020, 2019, 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

General Charities

My main donations was $750 to Givewell (to their Maximum Impact Fund). Once again I’m happy that Givewell make efficient use of money donated. I decided this year to give a higher proportion of my giving to them than last year.

Software and Internet Infrastructure Projects

$100 each to the Software Freedom Conservancy and Software in the Public Interest . Money not attached to any specific project.

$51 to the Internet Archive
$25 to Let’s Encrypt

Others including content creators

I donated $103 to Signum University to cover Corey Olsen’s Exploring the Lord of the Rings series plus other stuff I listen to that they put out.

I paid $100 to be a supporter of NZ News site The Spinoff

Patreon

I support a number of creators via Patreon

Share

,

Simon LyallAudiobooks – November 2021

The Secret Life of Groceries: The Dark Miracle of the American Supermarket by Benjamin Lorr

Several sections each looking at different aspect of the American Supermarket. From workers to suppliers to owners. Engaging. 4/5

Life Moves Pretty Fast: The lessons we learned from eighties movies (and why we don’t learn them from movies any more) by Hadley Freeman

A tour through mainstream 80s movies concentrating on one per chapter. Fun but covering serious topics too 4/5

The Boys: A Memoir of Hollywood and Family by Clint Howard and Ron Howard

Alternatively narrated by Ron and Clint about their family, growing up in Hollywood and acting. Only briefly covers events after the mid-1980s. Excellent. 4/5

Remote, Inc. : How to Thrive at Work . . . Wherever You Are by Robert C. Pozen & Alexandra Samuel

Lots of good advice for people suddenly working at home due to the pandemic. Tells you to think of yourself as a “business of one”. 3/5

The History of Spain: Land on a Crossroad by Joyce E. Salisbury

24 Lectures on Spanish History from the Stone age to the early 2000s. Interesting and easy to follow. Covers culture etc, not just kings and politics 3/5

For Your Eyes Only and other stories by Ian Fleming

Five short stories involving James Bond. Three straightforward Bond adventures and two others. I found them all very enjoyable. 3/5

See also: Top Audiobooks I’ve listened to

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Simon LyallRSS Feeds for Podcasts from The Spinoff

The Spinoff is a New Zealand news website. They have several podcasts but unfortunately don’t publish the RSS feeds for these. Instead they just list the Spotify and Apples Podcasts links. So if you use something other than Spotify or the official Apple Podcasts client it is difficult to listen to them.

I’ve found the feed links and listed them here, including those shows on break or no longer produced. These are the official RSS feeds on Acast (where The Spinoff host their podcasts). All Spinoff Podcasts are listed on The Spinoff Podcasts Page .

FAQ

Q1: How do you find these feeds?
A1: I follow the “Apple” link on the podcasts page to the “Apple Podcasts” page for the podcast. Then I put that link into https://getrssfeed.com/ which tells me the RSS feed.

Q2: Where are these feeds?
A2: The Spinoff publish their podcasts via the Acast, who host it and insert ads. Acast provide RSS feeds for all their podcasts. They even provide a page for each show.

Q3: Why don’t The Spinoff publish RSS feeds?
A3: I assume they think most people use Spotify or Apple Podcasts and want to clutter their website. I’ve asked but no reply. I also suspect (based on errors they make) that some of their process is manual so it would be extra work for each episode.

Q4: What is RSS?
A4: RSS is a special webpage that lists podcast episodes (or posts in a blog or Youtube videos in a channel) that can be scanned easily by software. It allows you to see/hear updates without having to visit sites regularly to check if there are new episodes.

Q5: Can you recommend a RSS viewer?
A5: I use Newblur and are very happy with it.

Q6: Can you recommend a Podcast Client?
A6: I manually download podcasts and copy them into the Android based player called Listen Audiobook Player. I wouldn’t really recommend this workflow. Have a a look at this article on the Best podcast apps of 2021 instead. They let you put in a podcast feed and whenever a new episode is published it will be downloaded automatically to your phone.

Share

,

Simon LyallAudiobook Reviews – October 2021

Goldfinger by Ian Fleming

The 1950s Gold Standard results in wacky smuggling operations and golf. Bond is on top of things until he’s not. But he survives anyway. 3/5

Spark: How Genius Ignites, From Child Prodigies to Late Bloomers by Claudia Kalb

Profiles of 13 great achievers from Yo-Yo Ma to Isaac Newton who found their calling at progressively greater ages. Some interesting profiles across varied fields. 3/5

Shoot for the Moon by James Donovan

The story of Apollo 11 and the space race that led up to it. Some stories I hadn’t heard before and well written but in a crowded field. 3/5

The Apocalypse Factory: Plutonium and the Making of the Atomic Age by Steve Olson

The history of the creation of Plutonium and the Hanford Nuclear Site that was built to manufacture it. 4/5

Captain Cook’s Epic Voyage by Geoffrey Blainey

The story of James Cook’s first voyage and it’s significance. Also covers the contemporary voyage by Jean de Surville. A good read that kept me interested. 4/5

Just the Funny Parts … And a Few Hard Truths About Sneaking Into the Hollywood Boys’ Club by Nell Scovell

A life as a female writer, producer and director in TV. A mix of funny stories, sexist stories and career highs and lows. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Jan Schmidt2.5 years of Oculus Rift

Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!

In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.

Outlier Detection

Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.

It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.

Plot of headset tracking – orientation and position

The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.

Left Touch controller tracking – orientation and position

This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).

Improved Pose Priors

Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.

This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.

LED visibility model

To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.

Better Smoothing

I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.

Improved Display Distortion Correction

I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.

Persistent Config Cache

Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.

Persistent Room Configuration

To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.

Video Capture + Processing

The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.

Session Simulator

I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.

To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):

export OHMD_TRACE_DIR=/home/user/openhmd-traces/
export OHMD_FULL_RECORDING=1

Up Next

The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.

I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.

One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!

As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.

Thank you to the kind people signed up as Github Sponsors for this project!

,

Simon LyallAudiobook Reviews – September 2021

The Second World War by Antony Beevor

A single volume covering the whole conflict in reasonable detail. Covered a few areas ie China that other volumes skip. Does the job. 3/5

Handprints on Hubble: An Astronaut’s Story of Invention by Kathryn D. Sullivan

Mostly an astronaut memoir with some extra material on the Hubble development and launch. Some good anecdotes and insights into the work to make the Hubble serviceable. 3/5

The Year of Less: How I Stopped Shopping, Gave Away My Belongings, and Discovered Life Is Worth More Than Anything You Can Buy in a Store by Cait Flanders

Memoir rather than self help. Okay read but not very actionable. 2/5

Space 2069: After Apollo: Back to the Moon, to Mars, and Beyond by David Whitehouse

A “future history” of Space travel for the next 50 years (to the 100th Anniversary of the moon landings). Plausible ideas and good science. 3/5

Second Best: The Amazing Untold Histories of the Greatest Runners-Up by Ben Pobjie

A series of amusing stories about people who didn’t come first. About 50% Australian examples. High jokes/minute with okay hit rate. 2/5

To Engineer Is Human: The Role of Failure in Successful Design by Henry Petroski

A very readable book on Engineering successes and failures and what can be learnt from them (and how things should be learnt). 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Matt PalmerDiscovering AWS IAM accounts

Let’s say you’re someone who happens to discover an AWS account number, and would like to take a stab at guessing what IAM users might be valid in that account. Tricky problem, right? Not with this One Weird Trick!

In your own AWS account, create a KMS key and try to reference an ARN representing an IAM user in the other account as the principal. If the policy is accepted by PutKeyPolicy, then that IAM account exists, and if the error says “Policy contains a statement with one or more invalid principals” then the user doesn’t exist.

As an example, say you want to guess at IAM users in AWS account 111111111111. Then make sure this statement is in your key policy:

{
  "Sid": "Test existence of user",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::111111111111:user/bob"
  },
  "Action": "kms:DescribeKey",
  "Resource": "*"
}

If that policy is accepted, then the account has an IAM user named bob. Otherwise, the user doesn’t exist. Scripting this is left as an exercise for the reader.

Sadly, wildcards aren’t accepted in the username portion of the ARN, otherwise you could do some funky searching with ...:user/a*, ...:user/b*, etc. You can’t have everything; where would you put it all?

I did mention this to AWS as an account enumeration risk. They’re of the opinion that it’s a good thing you can know what users exist in random other AWS accounts. I guess that means this is a technique you can put in your toolbox safe in the knowledge it’ll work forever.

Given this is intended behaviour, I assume you don’t need to use a key policy for this, but that’s where I stumbled over it. Also, you can probably use it to enumerate roles and anything else that can be a principal, but since I don’t see as much use for that, I didn’t bother exploring it.

There you are, then. If you ever need to guess at IAM users in another AWS account, now you can!

,

Linux AustraliaCouncil Meeting 29th September 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)

Apologies

–    Neil Cox (Council)

Meeting opened at 19:30 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

 

3. Items for discussion

  • Defamation law? Clinton thinks we should be discussing it at least. Sae Ra has done some research. Sae Ra thinks we should update the email list policy such that we retain the right to remove any mails against our policy and widen it such that the policy can be applied across all social media. Clinton suggests we just close down comments on everything till the legal state is clearer. Sae Ra thinks we can police all of our social media platforms. TODO Jonathan to look at widening the email list policy to cover all social media.
  • Steve speaking for the admin team. Not a lot to report.
  • Owen speaking for the Drupal team. Another event coming up, a twin event to the previous one. Tickets for the August event are valid for the November event. Call for speakers is done, announcing the program first week of October. Will push registrations again, might pick up a few more. Picked up one more sponsor. Budget is balanced already, so no red flags or anything. Good feedback on first one, have adjusted the format based on that. The attendees liked the interactive sessions, but wanted slightly less. A shuffle is happening in the committee, David Sparks stepping into the chair role, go through the formal process in October, Owen will shift to an event focus for the remainder of his term. They have booked Wellington for this year, pushed back to 2022, the venue is happy to release them from the contract. Sae Ra asks for some media for the LCA report.
  • Sae Ra speaking on LCA2022: Session selection coming up on Oct 8.

AGM working bee on sunday the third, Sae Ra to send out a calendar invite.

4. Items for noting

5. Other business

The post Council Meeting 29th September 2021 – Minutes appeared first on Linux Australia.

,

Simon LyallAuckland Gondolas: Practical or a rope to nowhere?

With the cancelling of the SkyPath and the short-lived $800 million Pedestrian/Cycling bridge proposal it seems that the only options for the next harbour crossing will be a full scale tunnel or bridge.

However I came across the Youtube video where the author suggests a gondola crossing of Auckland Harbour. I did a bit of investigation myself and looked at some possible routes, the majority of which appear not to work but two might benefit from further investigation.

In the last decade urban gondola systems have become increasingly popular. They work well in difficult terrain such as steep hills, valley or river crossings. They are also relatively cheap which makes them popular in middle-income countries in the Americas.

The Medellin MetroCable in Columbia currently has 6 lines, most of these act as feeders from hill suburbs to Tram or Metro lines.

The Cablebús in Mexico City opened two lines in 2021 connecting low income transport-poor suburbs with metro stations.

Cablebús in Mexico City

Technology

The technology I am assuming is a Detachable Monocable Gondola with cars similar to the Emirates Air-Line in London, Medellin MetroCable and Mexican Cablebús.

Monocable means that there is a single moving cable. Detachable means that each car unhooks from the cable inside stations and moves slowly (or even stops) for easy boarding and unboarding. The general characteristics of these systems are:

  • 8-10 person cars ( able to carry bicycles, wheelchairs etc )
  • Speed around 20 km/h
  • Capacity 2500 people/hour in each direction.
  • Approx 1 car every 15 seconds.

Note that it is possible to increase capacity, for instance the new Medellin line P has a capacity of 4,000 people per hour in each direction.

Cars in Station ready for boarding

Options for routes

I have provided details about two possible routes:

  • Route One is a simple crossing of Auckland Harbour just east of the Auckland Harbour Bridge.
  • Route Two connects the CBD (near Elliot Street), the University of Auckland and Parnell.

Note that apart from at stations cable systems are along able to make very slight turns at each tower. I have pretty much assumed each line will be straight apart of at stations.

Route One – East of the Bridge.

The will start from a station near the current Ponsonby Cruising Club. A 50m mast 150m north, then a 700m span across the harbour to another 50m mast and then a station in the Stokes Point Reserve.

Total length is 1100m. Travel time around 5 minutes.

This route is a direct replacement of the Skypath. It would cater to cyclists and pedestrians needing to get between Stokes Point and Westhaven. The exact placement of the stations is flexible. The main difficulties are the height required to provide room for tall ships to pass underneath. The Emirates line has similar requirements to allow Thames river traffic and higher than usual towers.

Apart from some sightseers this line would mainly serve cyclists wishing to go directly between North Shore and the city. This Greater Auckland article shows the 2020 Skypath business case was estimating 4500 daily users for the Skypath a few years after it was built.

The high towers and steep climb on the Emirates Air-line

Route Two – Elliot Street to Parnell via Wellesley Street and the University

This would start with a station near the corner of Elliot Street and Wellesley Street West (possibly in front of Bledisloe House). The line would go east above Wellesley Street and over the Art Gallery, Albert Park and parts of the University to a station at the University (perhaps at the corner of Grafton Road and Symonds Street). The line would then go across Grafton Gully to a station at Parnell Train Station and then up the hill to terminate on Parnell Road. The total length is around 1700 metres.

The Elliot Street to Parnell route

There are 4 stations on this route.

  • Parnell Village (near 236 Parnell Road) is an employment & entertainment area.
  • Parnell Train Station is near Parnell and the University but down a hill from both and across busy roads from the University. It is within walking distance of the Auckland Museum.
  • The University is well served by Symonds Street buses but not by rail and is cut off from the CBD and Parnell.
  • The Elliot Street endpoint would be just 50 metres from both the Aotea Train Station and a proposed Queen Street light rail station near the Civic.
Starting Point and direction of Line to University / Parnell

The idea of this line is to cross the difficult terrain that separates Parnell, Parnell Station, the University and the CBD. There should be a lot of possible journeys between these four. It will expand the effective radius of Aotea Station, Parnell Station and a Civic light-rail station. It would also improve Parnell’s connectivity.

Cablebús line over a busy road

Travel times ( assuming 20km/h speed and 1 minute per station ) on the most likely journeys:

JourneyWalking DistanceWalking timeCable distanceGondola time
Elliot St to University800m11 min600m4 min
Elliot St to Parnell 2100m30 min1700m9 min
Parnell Station to Parnell400m6 min250m3 min
Parnell Station to University1400m20 min800m5 min

This line has the potential to be very busy. All the stops on the route would benefit from the improved inter-connectivity.

Two extensions to the line are possible. At the Parnell end the line could be extended down to a station located in the valley around St Georges Bay Road to serve the businesses (and housing) located around there.

Another possible extension of the line would be directly along Wellesley St West from Elliot Street to Victoria Park. I am unsure if traffic to/from the Victoria Park area would justify this however.

Other Routes

I had problems finding other routes that were suitable for gondolas. I’ve included a route to Birkenhead as an example of one that does not appear to be competitive with the existing bus as an example.

The majority of possible routes had various combinations of long distances (which take too long at 20km/h), low density/population at station catchment, already being covered by existing links and lack of obvious demand.

Westhaven to Birkenhead via Northcote Point and Marine Terrace.

This starts on the western side of the Harbour Bridge near a small Park/Bungy HQ. It goes over the harbour alongside the Bridge and has a station near “The Wharf”. The Line would then go across Little Shoal Bay to Little Shoal Bay Reserve. There would be a turning point there to go north-west over the gully of the Le Roys Bush Reserve to a station on the side of Birkenhead Avenue near the viewing platform.

Length 3.9km and end to end travel time of around 15 minutes.

The problem with this line is that Westhaven is not a good destination. Route One across the harbour fills a gap for cyclists but extending the line to Birkenhead doesn’t seem significantly improve the catchment (especially for non-cyclists).

Other possible routes I looked at

  • A simple crossing West of the Bridge
  • From the south-east of St Mary’s Bay to Stokes Point
  • From Wynyard Quarter to Stokes Point.
  • From St Mary’s Bay to Stokes Point and then the Akoranga Bus station
  • From Wellesley Street West to Stokes Point.
  • Between Devonport and the city
  • Above Lake Road from Devonport to Takapuna
  • Between Stokes Point and Bayswater
  • Between Akoranga Bus station and Devonport

Conclusion

Urban Gondolas do not work as a general solution to urban public transport but are suited to certain niche routes across difficult terrain.

Starting from the simple harbour crossing I looked at several cross harbour routes but most were unpromising due to lack of under-served destinations that could be easily connected. I ended up just keeping the basic harbour crossing to fill the gap for cyclists and walkers.

The Elliot Street to Parnell line on the other hand might be a good fit for a gondola. It is less than 2km long but connects 3 popular locations and an under-utilised train station.

Costs of gondolas are fairly low compared to other PT options. The two Cablebús lines in Mexico City are each around 10km long and cost $US 146m and $US 208m. Ballpark estimates might be between $NZ 100 million to $NZ 200 million for each route, hopefully closer to $100m.

Share

,

Glen TurnerThe tyranny of product names

For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").

It's time we pushed back, and wrote for our readers, not for corporations.

It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".

So from here on in I'll be using the following:

  • Face Book. Formerly, "Facebook®".
  • Junos. Formerly JUNOS®.
  • ram. Formerly RAM.
  • Pan OS. Formerly PAN-OS®.
  • Unix. Formerly UNIX®.

I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.



comment count unavailable comments

,

Chris NeugebauerTalk Notes: On The Use and Misuse of Decorators

I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:

Simple @property implementation

This shows a demo of @property-style getters. Setters are left as an exercise :)


def demo_property(f):
    f.is_a_property = True
    return f


class HasProperties:

    def __getattribute__(self, name):
        ret = super().__getattribute__(name)
        if hasattr(ret, "is_a_property"):
            return ret()
        else:
            return ret

class Demo(HasProperties):

    @demo_property
    def is_a_property(self):
        return "I'm a property"

    def is_a_function(self):
        return "I'm a function"


a = Demo()
print(a.is_a_function())
print(a.is_a_property)

@run (The Scoped Block)

@run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.

def run(f):
    return f()

@run
def hello_world():
    return "Hello, World!"

print(hello_world)

@apply (Multi-line stream transformers)

def apply(transformer, iterable_):

    def _applicator(f):

        return(transformer(f, iterable_))

    return _applicator

@apply(map, range(100)
def fizzbuzzed(i):
    if i % 3 == 0 and i % 5 == 0:
        return "fizzbuzz"
    if i % 3 == 0:
        return "fizz"
    elif i % 5 == 0:
        return "buzz"
    else:
        return str(i)

Builders


def html(f):
    builder = HtmlNodeBuilder("html")
    f(builder)
    return builder.build()


class HtmlNodeBuilder:
    def __init__(self, tag_name):
       self.tag_name = tag_name
       self.nodes = []

   def node(self, f):
        builder = HtmlNodeBuilder(f.__name__)
        f(builder)
        self.nodes.append(builder.build())

    def text(self, text):
        self.nodes.append(text)

    def build(self):
      nodes = "\n".join(self.nodes)
       return f"<{self.tag_name}>\n{nodes}\n</{self.tag_name}>"


@html
def document(b):
   @b.node
   def head(b):
       @b.node
       def title(b):
           b.text("Hello, World!")

   @b.node
   def body(b):
       for i in range(10, 0, -1):
           @b.node
           def p(b):
               b.text(f"{i}")

Code Registries

This is an incomplete implementation of a code registry for handling simple text processing tasks:

```python

def register(self, input, output):

def _register_code(f):
    self.registry[(input, output)] = f
    return f

return _register_code

in_type = (iterable[str], (WILDCARD, ) out_type = (Counter, (WILDCARD, frequency))

@registry.register(in_type, out_type) def count_strings(strings):

return Counter(strings)

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, lowercase)) ) def words_to_lowercase(words): …

@registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, no_punctuation)) ) def words_without_punctuation(words): …

def find_steps( self, input_type, input_attrs, output_type, output_attrs ):

hand_wave()

def give_me(self, input, output_type, output_attrs):

steps = self.find_steps(
    type(input), (), output_type, output_attrs
)

temp = input
for step in steps:
    temp = step(temp)

return temp

,

Michael StillLinux bridges have their MTU overwritten when you add an interface

I discovered last night that network bridges on linux have their Maximum Transmission Unit (MTU) overwritten by whatever is the MTU value of the most recent interface added to the bridge. This is bad. Very bad. Specifically this is bad because MTU matters for accurately describing the capabilities of the network path the packets will travel on, so it shouldn’t be clobbered willy nilly.

Here’s an example of the behaviour:

# ip link add egr-br-ens1f0 mtu 1500 type bridge
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
# ip link add egr-eaa64a-o mtu 8950 type veth peer name egr-eaa64a-i
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
# brctl addif egr-br-ens1f0 egr-eaa64a-o
# ip link show dev egr-br-ens1f0
3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether da:82:cf:34:13:60 brd ff:ff:ff:ff:ff:ff

So you can see here that the bridge had an MTU of 1,500 bytes. We create a veth pair with an MTU of 8,950 bytes and add it to the bridge. Suddenly the bridge’s MTU is 8,950 bytes!

Perhaps this is my fault — brctl is pretty old school. Let’s use only ip commands to configure the bridge.

# ip link add mgr-br-ens1f0 mtu 1500 type bridge
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
# ip link add mgr-eaa64a-o mtu 8950 type veth peer name mgr-eaa64a-i
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
# ip link set mgr-eaa64a-o master mgr-br-ens1f0
# ip link show dev mgr-br-ens1f0
6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:55:4a:a8:19:00 brd ff:ff:ff:ff:ff:ff

The same problem occurs. Luckily, you can specify the MTU when you add an interface to a bridge, like this:

# ip link add zgr-br-ens1f0 mtu 1500 type bridge
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
# ip link add zgr-eaa64a-o mtu 8950 type veth peer name zgr-eaa64a-i
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
# ip link set zgr-eaa64a-o master zgr-br-ens1f0 mtu 1500
# ip link show dev zgr-br-ens1f0
9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:59:0b:a6:46:94 brd ff:ff:ff:ff:ff:ff

And that works nicely. In my case, this ended up with me writing code to lookup the MTU of the bridge I was adding the interface to, and then specifying that MTU back when adding the interface. I hope this helps someone else.

,

Simon LyallAudiobook Reviews – August 2021

Antarctica’s Lost Aviator: The Epic Adventure to Explore the Last Frontier on Earth by Jeff Maynard

Biography of Lincoln Ellsworth “an insecure man in search of a purpose” concentrating on his various polar expeditions in the 1920s and 30s culminating in an attempt to fly across antarctic. 3/5

Extra Life: A Short History of Living Longer by Steven Johnson

The story of Medical, Public health and other measures in the last 150 years that have dramatically raised life expectancy. The reality behind some well known stories. 3/5

Dr No by Ian Fleming

Bond returns to Jamaica for a soft assignment that turns out not to be soft after all. He dodges assassins and investigates a mysterious island. 3/5

Never Lost Again: The Google Mapping Revolution That Sparked New Industries and Augmented Our Reality by Bill Kilday

A insider’s account of the mapping company Keyhole that became Google Maps. Good mix of stories and analysis. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Jan SchmidtOpenHMD update

A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.

The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:

  • Adding velocity and acceleration API to OpenHMD.
  • Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.

Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.

The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.

Ongoing work – record and replay

My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.

Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.

Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.

I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.

To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.

Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.

,

Robert CollinsA moment of history

I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.

We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.

The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.

I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.

And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.

Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.


From: Mark Shuttleworth <xxx@xxx>
To: Robert Collins <xxx@xxx>
Date: Thu, 15 Jan 2004, 04:30

Tom Lord gave me your email address, I believe he’s
already sent you the email that I sent him so I’m sure
you have some background.

In short, I am going to fund some open source
development for a year. This is part of a new project
that I will be getting off the ground in the coming
weeks. I don’t know where it will lead, it’s flying in
the face of a stiff breeze but I think at the end of
the day it will at least fund a few very good open
source developers for a full year to work on the
projects they like most.

One of the pieces of the puzzle is high end source
code management. I’ll be looking to build an
infrastructure that will manage source code for
between 100 and 8000 open source projects (yes,
there’s a big difference between the two, I don’t know
at which end of the spectrum we will be at the end of
the year but our infrastructure will have to at least
be capable of scaling to the latter within two years)
with upwards of 2000 developers, drawing code from a
variety of sources, playing with it and spitting it
out regularly in nice packages.

Arch and Subversion seem to be the two leading
contenders for “next generation open source sccm”. I’d
be interested in your thoughts on the two of them, and
how they stack up. I’m looking to hire one person who
will lead that part of the effort. They’ll work alone
from home, and be responsible for two things. First,
extending the tool (arch or svn) in ways that help the
project. Such extensions will be released under an
open source licence, and hopefully embraced by the
tools maintainers and included in the mainline code
for the tool. And second, they will be responsible for
our large-scale implementation of SCCM, using that
tool, and building the management scripts and other
infrastructure to support such a large, and hopefully
highly automated, set of repositories.

Would you be interested in this position? What
attributes and experience do you think would make you
a great person to have on the team? What would your
salary expectation be, as a monthly figure, for a one
year contract full time?

I’m currently on your continent, well, just off it. On
Lizard Island, up North. Am headed today for Brisbane,
then on the 17th to Launceston via Melbourne. If you
happen to be on any of those stops, would you be
interested in meeting up to discuss it further?

If you’re curious you can find out a bit more about me
at www.markshuttleworth.com. This project is much
lower key than some of what you’ll find there. It’s a
very long shot indeed. But if at worst all that
happens is a bunch of open source work gets funded at
my expense I’ll feel it was money well spent.

Cheers,
Mark

=====

“Good judgement comes from experience, and often experience
comes from bad judgement” – Rita Mae Brown


,

Arjen LentzClassic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The post Classic McEleice and the NIST search for post-quantum crypto first appeared on Lentz family blog.

,

Chris NeugebauerAdding a PurpleAir monitor to Home Assistant

Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.

Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.

I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.

You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.

How to use this

First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.

Adding the configuration

In HomeAssistant, add the following line to your configuration.yaml:

sensor: !include purpleair.yaml

and then add the following contents to purpleair.yaml


 - platform: rest
   name: 'PurpleAir'

   # Substitute in the URL of the sensor you care about.  To find the URL, go
   # to purpleair.com/map, find your sensor, click on it, click on "Get This
   # Widget" then click on "JSON".
   resource: https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}

   # Only query once a minute to avoid rate limits:
   scan_interval: 60

   # Set this sensor to be the AQI value.
   #
   # Code translated from JavaScript found at:
   # https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#
   value_template: >
     {{ value_json["results"][0]["Label"] }}
   unit_of_measurement: ""
   # The value of the sensor can't be longer than 255 characters, but the
   # attributes can.  Store away all the data for use by the templates below.
   json_attributes:
     - results

 - platform: template
   sensors:
     purpleair_aqi:
       unique_id: 'purpleair_SENSORID_aqi_pm25'
       friendly_name: 'PurpleAir PM2.5 AQI'
       value_template: >
         {% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}
           {{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}
         {%- endmacro %}
         {% if (states('sensor.purpleair_pm25')|float) > 1000 %}
           invalid
         {% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}
         {% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}
         {% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}
           {{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}
         {% else %}
           invalid
         {% endif %}
       unit_of_measurement: "bit"
     purpleair_description:
       unique_id: 'purpleair_SENSORID_description'
       friendly_name: 'PurpleAir AQI Description'
       value_template: >
         {% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}
           Hazardous
         {% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}
           Very Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}
           Unhealthy
         {% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}
           Unhealthy for Sensitive Groups
         {% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}
           Moderate
         {% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}
           Good
         {% else %}
           undefined
         {% endif %}
       entity_id: sensor.purpleair
     purpleair_pm25:
       unique_id: 'purpleair_SENSORID_pm25'
       friendly_name: 'PurpleAir PM 2.5'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['PM2_5Value'] }}"
       unit_of_measurement: "μg/m3"
       entity_id: sensor.purpleair
     purpleair_temp:
       unique_id: 'purpleair_SENSORID_temperature'
       friendly_name: 'PurpleAir Temperature'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['temp_f'] }}"
       unit_of_measurement: "°F"
       entity_id: sensor.purpleair
     purpleair_humidity:
       unique_id: 'purpleair_SENSORID_humidity'
       friendly_name: 'PurpleAir Humidity'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['humidity'] }}"
       unit_of_measurement: "%"
       entity_id: sensor.purpleair
     purpleair_pressure:
       unique_id: 'purpleair_SENSORID_pressure'
       friendly_name: 'PurpleAir Pressure'
       value_template: "{{ state_attr('sensor.purpleair','results')[0]['pressure'] }}"
       unit_of_measurement: "hPa"
       entity_id: sensor.purpleair

Quirks

I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�

,

Stewart SmithAn Unearthly Child

So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.

A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.

I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.

Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.

“This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).

,

Jan SchmidtRift CV1 – Getting close now…

It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.

Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.

Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:

Beat Saber – Skunkynator playing Expert+, Mar 16 2021

Strap in. Here’s what I’ve worked on in the last 6 weeks:

Pose Matching improvements

Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.

I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.

Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.

In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.

I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.

LED labelling

When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.

The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.

IMU Gyro scale

One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.

The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.

From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.

The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.

Limited interpolation

If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.

Exponential filtering

I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.

Non-blocking radio

Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.

Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.

As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.

Haptic feedback

The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.

In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.

With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.

The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip

Problem areas

Unexpected tracking losses

I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:

To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.

Sensor fusion efficiency

The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.

At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.

Lens Distortion

To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.

Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.

Camera / USB failures

The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.

Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.

Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.

Occluded cases

Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.

If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.

Upstreaming

April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.

Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.

One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.

Others

I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.

Other Headsets

At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.

Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.

Sponsorship

This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.

,

Stewart Smithlibeatmydata v129

Every so often, I release a new libeatmydata. This has not happened for a long time. This is just some bug fixes, most of which have been in the Debian package for some time, I’ve just been lazy and not sat down and merged them.

git clone https://github.com/stewartsmith/libeatmydata.git

Download the source tarball from here: libeatmydata-129.tar.gz and GPG signature: libeatmydata-129.tar.gz.asc from my GPG key.

Or, feel free to grab some Fedora RPMs:

Releases published also in the usual places:

,

BlueHackersWorld bipolar day 2021

Today, 30 March, is World Bipolar Day.

Vincent van Gogh - Worn Out

Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).

The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.

Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.

Have you heard of Bipolar II?

Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.

Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)

Call to action

If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.

Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.

I will be writing more about the topic in the coming time.

The post World bipolar day 2021 first appeared on BlueHackers.org.

,

Michael StillManipulating Docker images without Docker installed

Recently I’ve been playing a bit more with Docker images and Docker image repositories. I had in the past written a quick hack to let me extract files from a Docker image, but I wanted to do something a little more mature than that.

For example, sometimes you want to download an image from a Docker image repository without using Docker. Naively if you had Docker, you’d do something like this:

docker pull busybox
docker save busybox

However, that assumes that you have Docker installed on the machine downloading the images, and that’s sometimes not possible for security reasons. The most obvious example I can think of is airgapped secure environments where you need to walk the data between two networks, and the unclassified network machine doesn’t allow administrator access to install Docker.

So I wrote a little tool to do image manipulation for me. The tool is called Occy Strap, is written in python, and is available on pypi. That means installing it is relatively simple:

python3 -m venv ~/virtualenvs/occystrap
. ~/virtualenvs/occystrap/bin/activate
pip install occystrap

Which doesn’t require administrator permissions. There are then a few things we can do with Occy Strap.

Downloading an image from a repository and storing as a tarball

Let’s say we want to download an image from a repository and store it as a local tarball. This is a common thing to want to do in airgapped environments for example. You could do this with docker with a docker pull; docker save. The Occy Strap equivalent is:

occystrap fetch-to-tarfile registry-1.docker.io library/busybox \
    latest busybox.tar

In this example we’re pulling from the Docker Hub (registry-1.docker.io), and are downloading busybox’s latest version into a tarball named busybox-occy.tar. This tarball can be loaded with docker load -i busybox.tar on an airgapped Docker environment.

Michael StillComplexity Arrangements for Sustained Innovation: Lessons From 3M Corporation

This is the second business paper I’ve read this week while reading along with my son’s university studies. The first is discussed here if you’re interested. This paper is better written, but more academic in its style. This ironically makes it harder to read, because its grammar style is more complicated and harder to parse.

The take aways for me from this paper is that 3M is good at encouraging serendipity and opportune moments that create innovation. This is similar to Google’s attempts to build internal peer networks and deliberate lack of structure. In 3M’s case its partially expressed as 15% time, which is similar to Google’s 20% time. Specifically, “eureka moments” cannot be planned or scheduled, but require prior engagement.

chance favors only the prepared mind — Pasteur

3M has a variety of methods for encouraging peer networks, including technology fairs, “bootlegging” (borrowing idle resources from other teams), innovation grants, and so on.

At the same time, 3M tries to keep at least a partial focus on events driving by schedules. The concept of time is important here — there is a “time to wait” (we are ahead of the market); “a time in between” (15% time); and “a time across” (several parallel efforts around related innovations to speed up the process).

The idea of “a time to wait” is quite interesting. 3M has a history of discovering things where there is no current application, but somehow corporately remembering those things so that when there are applications years later they can jump in with a solution. They embrace story telling as part of their corporate memory, as well as a way of ensuring they learn from past success and failure.

Finally, 3M is similar to Google in their deliberate flexibility with the rules. 15% time isn’t rigidly counted for example — it might be 15% a week, or 15% of a year, or more or less than that. As long as it can be justified as a good use of resources its ok.

This was a good read and I enjoyed it.

 

Michael StillA corporate system for continuous innovation: The case of Google Inc

So, one of my kids is studying some business units at university and was assigned this paper to read. I thought it looked interesting, so I gave it a read as well.

While not being particularly well written in terms of style, this is an approachable introduction to the culture and values of Google and how they play into Google’s continued ability to innovate. The paper identifies seven important attributes of the company’s culture that promote innovation, as ranked by the interviewed employees:

  • The culture is innovation oriented.
  • They put a lot of effort into selecting individuals who will fit well with the culture at hiring time.
  • Leaders are seen as performing a facilitiation role, not a directive one.
  • The organizational structure is loosely defined.
  • OKRs and aligned performance incentives.
  • A culture of organizational learning through postmortems and building internal social networks. Learning is considered a peer to peer activity that is not heavily structured.
  • External interaction — especially in the form of aggressive acquisition of skills and technologies in areas Google feels they are struggling in.

Additionally, they identify eight habits of a good leader:

  • A good coach.
  • Empoyer your team and don’t micro-manage.
  • Express interest in employees’ success and well-being.
  • Be productive and results oriented.
  • Be a good communicator and listen to your team.
  • Help employees with career development.
  • Have a clear vision and strategy for the team.
  • Have key technical skills, so you can help advise the team.

Overall, this paper is well worth the time to read. I enjoyed it and found it insightful.

,

Stewart SmithThe Apple Power Macintosh 7200/120 PC Compatible (Part 1)

So, I learned something recently: if you pick up your iPhone with eBay open on an auction bid screen in just the right way, you may accidentally click the bid button and end up buying an old computer. Totally not the worst thing ever, and certainly a creative way to make a decision.

So, not too long later, a box arrives!

In the 1990s, Apple created some pretty “interesting” computers and product line. One thing you could get is a DOS Compatibility (or PC Compatibility) card. This was a card that went into one of the expansion slots on a Mac and had something really curious on it: most of the guts of a PC.

Others have written on these cards too: https://www.engadget.com/2009-12-10-before-there-was-boot-camp-there-were-dos-compatibility-cards.html and http://www.edibleapple.com/2009/12/09/blast-from-the-past-a-look-back-at-apples-dos-compatibility-cards/. There’s also the Service Manual https://tim.id.au/laptops/apple/misc/pc_compatibility_card.pdf with some interesting details.

The machine I’d bought was an Apple Power Macintosh 7200/120 with the PC Compatible card added afterwards (so it doesn’t have the PC Compatible label on the front like some models ended up getting).

The Apple Power Macintosh 7200/120

Wikipedia has a good article on the line, noting that it was first released in August 1995, and fitting for the era, was sold as about 14 million other model numbers (okay not quite that bad, it was only a total of four model numbers for essentially the same machine). This specific model, the 7200/120 was introduced on April 22nd, 1996, and the original web page describing it from Apple is on the wayback machine.

For older Macs, Low End Mac is a good resource, and there’s a page on the 7200, and amazingly Apple still has the tech specs on their web site!

The 7200 series replaced the 7100, which was one of the original PowerPC based Macs. The big changes are using the industry standard PCI bus for its three expansion slots rather than NuBus. Rather surprisingly, NuBus was not Apple specific, but you could not call it widely adopted by successful manufacturers. Apple first used NuBus in the 1987 Macintosh II.

The PCI bus was standardized in 1992, and it’s almost certain that a successor to it is in the computer you’re using to read this. It really quite caught on as an industry standard.

The processor of the machine is a PowerPC 601. The PowerPC was an effort of IBM, Apple, and Motorola (the AIM Alliance) to create a class of processors for personal computers based on IBM’s POWER Architecture. The PowerPC 601 was the first of these processors, initially used by Apple in its Power Macintosh range. The machine I have has one running at a whopping 120Mhz. There continued to be PowerPC chips for a number of years, and IBM continued making POWER processors even after that. However, you are almost certainly not using a PowerPC derived processor in the computer you’re using to read this.

The PC Compatibility card has on it a full on legit Pentium 100 processor, and hardware for doing VGA graphics, a Sound Blaster 16 and the other things you’d usually expect of a PC from 1996. Since it’s on a PCI card though, it’s a bit different than a PC of the era. It doesn’t have any expansion slots of its own, and in fact uses up one of the three PCI slots in the Mac. It also doesn’t have its own floppy drive, or hard drive. There’s software on the Mac that will let the PC card use the Mac’s floppy drive, and part of the Mac’s hard drive for the PC!

The Pentium 100 was the first mass produced superscalar processor. You are quite likely to be using a computer with a processor related to the Pentium to read this, unless you’re using a phone or tablet, or one of the very latest Macs; in which case you’re using an ARM based processor. You likely have more ARM processors in your life than you have socks.

Basically, this computer is a bit of a hodge-podge of historical technology, some of which ended up being successful, and other things less so.

Let’s have a look inside!

So, one of the PCI slots has a Vertex Twin Turbo 128M8A video card in it. There is not much about this card on the internet. There’s a photo of one on Wikimedia Commons though. I’ll have to investigate more.

Does it work though? Yes! Here it is on my desk:

The powered on Power Mac 7200/120

Even with Microsoft Internet Explorer 4.0 that came with MacOS 8.6, you can find some places on the internet you can fetch files from, at a not too bad speed even!

More fun times with this machine to come!

,

Rusty RussellA Model for Bitcoin Soft Fork Activation

TL;DR: There should be an option, taproot=lockintrue, which allows users to set lockin-on-timeout to true. It should not be the default, though.

As stated in my previous post, we need actual consensus, not simply the appearance of consensus. I’m pretty sure we have that for taproot, but I would like a template we can use in future without endless debate each time.

  • Giving every group a chance to openly signal for (or against!) gives us the most robust assurance that we actually have consensus. Being able to signal opposition is vital, since everyone can lie anyway; making opposition difficult just reduces the reliability of the signal.
  • Developers should not activate. They’ve tried to assure themselves that there’s broad approval of the change, but that’s not really a transferable proof. We should be concerned about about future corruption, insanity, or groupthink. Moreover, even the perception that developers can set the rules will lead to attempts to influence them as Bitcoin becomes more important. As a (non-Bitcoin-core) developer I can’t think of a worse hell myself, nor do we want to attract developers who want to be influenced!
  • Miner activation is actually brilliant. It’s easy for everyone to count, and majority miner enforcement is sufficient to rely on the new rules. But its real genius is that miners are most directly vulnerable to the economic majority of users: in a fork they have to pick sides continuously knowing that if they are wrong, they will immediately suffer economically through missed opportunity cost.
  • Of course, economic users are ultimately in control. Any system which doesn’t explicitly encode that is fragile; nobody would argue that fair elections are unnecessary because if people were really dissatisfied they could always overthrow the government themselves! We should make it as easy for them to exercise this power as possible: this means not requiring them to run unvetted or home-brew modifications which will place them at more risk, so developers need to supply this option (setting it should also change the default User-Agent string, for signalling purposes). It shouldn’t be an upgrade either (which inevitably comes with other changes). Such a default-off option provides both a simple method, and a Schelling point for the lockinontimeout parameters. It also means much less chance of this power being required: “Si vis pacem, para bellum“.

This triumverate model may seem familiar, being widely used in various different governance systems. It seems the most robust to me, and is very close to what we have evolved into already. Formalizing it reduces uncertainty for any future changes, as well.

,

Rusty RussellBitcoin Consensus and Solidarity

Bitcoin’s consensus rules define what is valid, but this isn’t helpful when we’re looking at changing the rules themselves. The trend in Bitcoin has been to make such changes in an increasingly inclusive and conservative manner, but we are still feeling our way through this, and appreciating more nuance each time we do so.

To use Bitcoin, you need to remain in the supermajority of consensus on what the rules are. But you can never truly know if you are. Everyone can signal, but everyone can lie. You can’t know what software other nodes or miners are running: even expensive testing of miners by creating an invalid block only tests one possible difference, may still give a false negative, and doesn’t mean they can’t change a moment later.

This risk of being left out is heightened greatly when the rules change. This is why we need to rely on multiple mechanisms to reassure ourselves that consensus will be maintained:

  1. Developers assure themselves that the change is technically valid, positive and has broad support. The main tools for this are open communication, and time. Developers signal support by implementing the change.
  2. Users signal their support by upgrading their nodes.
  3. Miners signal their support by actually tagging their blocks.

We need actual consensus, not simply the appearance of consensus. Thus it is vital that all groups know they can express their approval or rejection, in a way they know will be heard by others. In the end, the economic supermajority of Bitcoin users can set the rules, but no other group or subgroup should have inordinate influence, nor should they appear to have such control.

The Goodwill Dividend

A Bitcoin community which has consensus and knows it is not only safest from a technical perspective: the goodwill and confidence gives us all assurance that we can make (or resist!) changes in future.

It will also help us defend against the inevitable attacks and challenges we are going to face, which may be a more important effect than any particular soft-fork feature.

,

Michael StillShaken Fist v0.4.2

Shaken Fist v0.4.2 snuck out yesterday as part of shooting this tutorial video. That’s because I really wanted to demonstrate floating IPs, which I only recently got working nicely. Overall in v0.4.2 we:

  • Improved CI for image API calls.
  • Improved upgrade CI testing.
  • Improved network state tracking.
  • Floating IPs now work, and have covering CI. shakenfist#257
  • Resolve leaks of floating IPs from both direct use and NAT gateways. shakenfist#256
  • Resolve leaks of IPManagers on network delete. shakenfist#675
  • Use system packages for ansible during install.

Michael StillStarting your first instance on Shaken Fist (a video tutorial)

As a bit of an experiment, I’ve made this quick and dirty “vlog” style tutorial video to show you how to install Shaken Fist on a single machine and boot your first instance. I demonstrate how to install, setup your first virtual network, start the instance, inspect events that the instance has experienced, and then log in.

Let me know if you think its useful.

,

Jan SchmidtRift CV1 – Testing SteamVR

Update:

This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.


I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:

  • Make sure Steam + SteamVR are already installed.
  • Clone the SteamVR-OpenHMD repository:
git clone --recursive https://github.com/ChristophHaag/SteamVR-OpenHMD.git
  • Switch the internal copy of OpenHMD to the right branch:
cd subprojects/openhmd
git remote add thaytan-github https://github.com/thaytan/OpenHMD.git
git fetch thaytan-github
git checkout -b rift-kalman-filter thaytan-github/rift-kalman-filter
cd ../../
  • Use meson to build and register the SteamVR-OpenHMD binaries. You may need to install meson first (see below):
meson -Dbuildtype=release build
ninja -C build
./install_files_to_build.sh
./register.sh
  • It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
  • Make sure your USB devices are accessible to your user account by configuring udev. See the OpenHMD guide here: https://github.com/OpenHMD/OpenHMD/wiki/Udev-rules-list
  • Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
  • It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
./build/subprojects/openhmd/openhmd_simple_example
  • Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental 🙂
  • Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.

Meson

I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.

If you need to install meson, there are instructions here – https://mesonbuild.com/Getting-meson.html summarising the various methods.

I use a copy in my home directory, but you need to make sure ~/.local/bin is in your PATH

pip3 install --user meson

,

Jan SchmidtRift CV1 – Pose rejection

I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.

In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):

  1. Do a brute-force search to match LEDs to 3D models, then (if matched)
    1. Assign labels to each LED blob in the video frame saying what LED they are.
    2. Send an update to the fusion filter about the position / orientation of the device
  2. Then, as each video frame arrives:
    1. Use motion flow between video frames to track the movement of each visible LED
    2. Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
  3. Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.

The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible

What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.

Gravity Vector matching

The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.

The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.

Partial tracking matches

The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.

With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.

While this definitely seems to help, I think the approach can use more work.

Status

With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.

I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.

For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.

Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.

After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.

,

Michael StillBooks read in January 2021

Its been 10 years since I’ve read enough to write one of these summary posts… Which I guess means something. This month I’ve been thinking a lot about systems design and how to avoid Second Systems effect while growing a product, which guided my reading choices a fair bit. A fair bit of that reading has been in the form of blog posts and twitter threads, so I am going to start including those in these listings of things I’ve read.

Social media posts of note:

Books:

,

Jan SchmidtHitting a milestone – Beat Saber!

I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!

I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy 🙂

Check it out:

I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.

I have a list of more things to work on. See you at the next update!

Michael StillShaken Fist 0.4.1

I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.

Shaken Fist logo
We also got a cool logo during the v0.4 cycle as well.

The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:

  • much more extensive continuous integration coverage, including several new jobs.
  • checksumming image downloads, and retrying images where the checksum fails.
  • reworked locking.
  • etcd reliability improvements.
  • refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
  • images now track a state much like instances and networks.
  • a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.

In terms of new features, we also added:

  • a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
  • networks are now checked to ensure that they have a reasonable minimum size.
  • addition of a simple etcd backup and restore tool (sf-backup).
  • improved data upgrade of previous installations.
  • VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
  • we are smarter about what networks we restore on startup, and don’t restore dead networks.

We also now require python 3.8.

Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.

,

Sam WatkinsDeveloping CZ, a dialect of C that looks like Python

In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.

Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.

My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.

The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.

C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).

At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.

CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.

I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.

CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.

One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.

CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.

I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.

For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).

The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.

CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.

I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!

CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.

Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.

#!/usr/local/bin/cz --
use b
use ccomplex

Main:
	num outside = 16, ox = -0.5, oy = 0, r = 1.5
	long i, max_i = 50, rb_i = 30
	space()
	uint32_t *px = pixel()  # CONFIGURE!
	num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
	for(y, 0, h):
		cmplx c = x0 + (y0-d*y)*I
		repeat(w):
			cmplx w = c
			for i=0; i < max_i && cabs(w) < outside; ++i
				w = w*w + c
			*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
			c += d

I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.

An image from the Mandelbrot set, generated by a fairly simple CZ program.

I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.

Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:

+add, subtract, disjoint union, zero
×multiply, divide, cartesian product, one
^power, root, logarithm
sin, cos, sin-1, cos-1, hypot, atan2
δdifferential, integral
a set of minimal relational operators for elementary math

I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.

CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.

Thanks for reading. I wrote this article during the process of applying to join Toptal, which appears to be a freelancing portal for top developers; and in response to this article on toptal: After All These Years, the World is Still Powered by C Programming.

My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!

,

Glen TurnerCompiling and installing software for the uBITX v6 QRP amateur radio transciever

The uBITX uses an Arduino internally. This article describes how to update its software.

Required hardware

The connector on the back is a Mini-B USB connector, so you'll need a "Mini-B to A" USB cable. This is not the same cable as used with older Android smartphones. The Mini-B connector was used with a lot of cameras a decade ago.

You'll also need a computer. I use a laptop with Fedora Linux installed.

Required software for software development

In Fedora all the required software is installed with sudo dnf install arduino git. Add yourself to the users and lock groups with sudo usermod -a -G users,lock $USER (on Debian-style systems use sudo usermod -a -G dialout,lock $USER). You'll need to log out and log in again for that to have an effect (if you want to see which groups you are already in, then use the id command).

Run arduino as your ordinary non-root user to create the directories used by the Arduino IDE. You can quit the IDE once it starts.

Obtain the uBITX software

$ cd ~/Arduino
$ git clone https://github.com/afarhan/ubitxv6.git ubitx_v6.1_code

Connect the uBITX to your computer

Plug in the USB cable and turn on the radio. Running dmesg will show the Arduino appearing as a "USB serial" device:

usb 1-1: new full-speed USB device number 6 using xhci_hcd
usb 1-1: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
usb 1-1: Product: USB Serial
usbcore: registered new interface driver ch341
usbserial: USB Serial support registered for ch341-uart
ch341 1-1:1.0: ch341-uart converter detected
usb 1-1: ch341-uart converter now attached to ttyUSB1

If you want more information about the USB device then use:

$ lsusb -d 1a86:7523
Bus 001 Device 006: ID 1a86:7523 QinHeng Electronics CH340 serial converter


comment count unavailable comments

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

A Quick Recap

When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

Integration in OpenHMD

When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

Headset motion capture trace

If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

Headset motion capture – zoomed in view

This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

Losing Track

The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

That looks more like this

Headset motion capture trace with tracking errors

This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

Latency Compensation

There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

Here’s a good example of the difference:

Before: Position filtering with no latency compensation
After: Latency-compensated position reports

Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

There are some other possible approaches to this problem:

  • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
  • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
  • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
  • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

Controllers

So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

Oculus Camera view of headset and left controller.

This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

Position tracking of left controller with lots of tracking loss.

Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

Timing Improvements

One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

Sponsorships

All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

Next Steps

The next things on my todo list are:

  • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
  • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
  • Publish the UKF branch for other people to try.
  • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
  • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

,

Tim SerongScope Creep

On December 22, I decided to brew an oatmeal stout (5kg Gladfield ale malt, 250g dark chocolate malt, 250g light chocolate malt, 250g dark crystal malt, 500g rolled oats, 150g rice hulls to stop the mash sticking, 25g Pride of Ringwood hops, Safale US-05 yeast). This all takes a good few hours to do the mash and the boil and everything, so while that was underway I thought it’d be a good opportunity to remove a crappy old cupboard from the laundry, so I could put our nice Miele upright freezer in there, where it’d be closer to the kitchen (the freezer is presently in a room at the other end of the house).

The cupboard was reasonably easy to rip out, but behind it was a mouldy and unexpectedly bright yellow wall with an ugly gap at the top where whoever installed it had removed the existing cornice.

Underneath the bottom half of the cupboard, I discovered not the cork tiles which cover the rest of the floor, but a layer of horrific faux-tile linoleum. Plus, more mould. No way was I going to put the freezer on top of that.

So, up came the floor covering, back to nice hardwood boards.

Of course, the sink had to come out too, to remove the flooring from under its cabinet, and that meant pulling the splashback tiles (they had ugly screw holes in them anyway from a shelf that had been bracketed up on top of them previously).

Removing the tiles meant replacing a couple of sections of wall.

Also, we still needed to be able to use the washing machine through all this, so I knocked up a temporary sink support.

New cornice went in.

The rest of the plastering was completed and a ceiling fan installed.

Waterproofing membrane was applied where new tiles will go around a new sink.

I removed the hideous old aluminium backed weather stripping from around the exterior door and plastered up the exposed groove.

We still need to paint everything, get the new sink installed, do the tiling work and install new taps.

As for the oatmeal stout, I bottled that on January 2. From a sample taken at the time, it should be excellent, but right now still needs to carbonate and mature.

Stewart SmithPhotos from Taiwan

A few years ago we went to Taiwan. I managed to capture some random bits of the city on film (and also some shots on my then phone, a Google Pixel). I find the different style of art on the streets around the world to be fascinating, and Taiwan had some good examples.

I’ve really enjoyed shooting Kodak E100VS film over the years, and some of my last rolls were shot in Taiwan. It’s a film that unfortunately is not made anymore, but at least we have a new Ektachrome to have fun with now.

Words for our time: “Where there is democracy, equality and freedom can exist; without democracy, equality and freedom are merely empty words”.

This is, of course, only a small number of the total photos I took there. I’d really recommend a trip to Taiwan, and I look forward to going back there some day.

,

Tim SerongI Have No Idea How To Debug This

On my desktop system, I’m running XFCE on openSUSE Tumbleweed. When I leave my desk, I hit the “lock screen” button, the screen goes black, and the monitors go into standby. So far so good. When I come back and mash the keyboard, everything lights up again, the screens go white, and it says:

blank: Shows nothing but a black screen
Name: tserong@HOSTNAME
Password:
Enter password to unlock; select icon to lock

So I type my password, hit ENTER, and I’m back in action. So far so good again. Except… Several times recently, when I’ve come back and mashed the keyboard, the white overlay is gone. I can see all my open windows, my mail client, web browser, terminals, everything, but the screen is still locked. If I type my password and hit ENTER, it unlocks and I can interact again, but this is where it gets really weird. All the windows have moved down a bit on the screen. For example, a terminal that was previously neatly positioned towards the bottom of the screen is now partially off the screen. So “something” crashed – whatever overlay the lock thingy put there is gone? And somehow this affected the position of all my application windows? What in the name of all that is good and holy is going on here?

Update 2020-12-21: I’ve opened boo#1180241 to track this.

,

Stewart SmithTwo Photos from Healseville Sanctuary

If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

Leah trying to photograph a much too close bird
Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

Stewart SmithPhotos from Adelaide

Some shots on Kodak Portra 400 from Adelaide. These would have been shot with my Nikon F80 35mm body, I think all with the 50mm lens. These are all pre-pandemic, and I haven’t gone and looked up when exactly. I’m just catching up on scanning some negatives.

,

Stewart SmithWhy you should use `nproc` and not grep /proc/cpuinfo

There’s something really quite subtle about how the nproc utility from GNU coreutils works. If you look at the man page, it’s even the very first sentence:

Print the number of processing units available to the current process, which may be less than the number of online processors.

So, what does that actually mean? Well, just because the computer some code is running on has a certain number of CPUs (and here I mean “number of hardware threads”) doesn’t necessarily mean that you can spawn a process that uses that many. What’s a simple example? Containers! Did you know that when you invoke docker to run a container, you can easily limit how much CPU the container can use? In this case, we’re looking at the --cpuset-cpus parameter, as the --cpus one works differently.

$ nproc
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# nproc
2
bash-4.2# exit

$ docker run --cpuset-cpus=0-2 --rm=true -it  amazonlinux:2
bash-4.2# nproc
3

As you can see, nproc here gets the right bit of information, so if you’re wanting to do a calculation such as “Please use up to the maximum available CPUs” as a parameter to the configuration of a piece of software (such as how many threads to run), you get the right number.

But what if you use some of the other common methods?

$ /usr/bin/lscpu -p | grep -c "^[0-9]"
8
$ grep -c 'processor' /proc/cpuinfo 
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# yum install -y /usr/bin/lscpu
......
bash-4.2# /usr/bin/lscpu -p | grep -c "^[0-9]"
8
bash-4.2# grep -c 'processor' /proc/cpuinfo 
8
bash-4.2# nproc
2

In this case, if you base your number of threads off grepping lscpu you take another dependency (on the util-linux package), which isn’t needed. You also get the wrong answer, as you do by grepping /proc/cpuinfo. So, what this will end up doing is just increase the number of context switches, possibly also adding a performance degradation. It’s not just in docker containers where this could be an issue of course, you can use the same mechanism that docker uses anywhere you want to control resources of a process.

Another subtle thing to watch out for is differences in /proc/cpuinfo content depending on CPU architecture. You may not think it’s an issue today, but who wants to needlessly debug something?

tl;dr: for determining “how many processes to run”: use nproc, don’t grep lscpu or /proc/cpuinfo

,

Stewart SmithPhotos from Tasmania (2017)

On the random old photos train, there’s some from spending time in Tasmania post linux.conf.au 2017 in Hobart.

All of these are Kodak E100VS film, which was no doubt a bit out of date by the time I shot it (and when they stopped making Ektachrome for a while). It was a nice surprise to be reminded of a truly wonderful Tassie trip, taken with friends, and after the excellent linux.conf.au.

,

Glen TurnerBlocking a USB device

udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"


comment count unavailable comments

,

Glen TurnerConverting MPEG-TS to, well, MPEG

Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


comment count unavailable comments

,

Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

@apply

def apply(transform):

    def __decorator__(using_this):
        return transform(using_this)

    return __decorator__


numbers = [1, 2, 3, 4, 5]

@apply(lambda f: list(map(f, numbers)))
def squares(i):
  return i * i

print(list(squares))

# prints: [1, 4, 9, 16, 25]

Init.java

public class Init {
  public static void main(String[] args) {
    System.out.println("Hello, World!")
  }
}

@switch and @case

__NOT_A_MATCHER__ = object()
__MATCHER_SORT_KEY__ = 0

def switch(cls):

    inst = cls()
    methods = []

    for attr in dir(inst):
        method = getattr(inst, attr)
        matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)

        if matcher == __NOT_A_MATCHER__:
            continue

        methods.append(method)

    methods.sort(key = lambda i: i.__matcher_sort_key__)

    for method in methods:
        matches = method.__matcher__()
        if matches:
            return method()

    raise ValueError(f"No matcher matches value {test_value}")

def case(matcher):

    def __decorator__(f):
        global __MATCHER_SORT_KEY__

        f.__matcher__ = matcher
        f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
        __MATCHER_SORT_KEY__ += 1
        return f

    return __decorator__



if __name__ == "__main__":
    for i in range(100):

        @switch
        class FizzBuzz:

            @case(lambda: i % 15 == 0)
            def fizzbuzz(self):
                return "fizzbuzz"

            @case(lambda: i % 3 == 0)
            def fizz(self):
                return "fizz"

            @case(lambda: i % 5 == 0)
            def buzz(self):
                return "buzz"

            @case(lambda: True)
            def default(self):
                return "-"

        print(f"{i} {FizzBuzz}")

,

Craig SandersFuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

,

Rusty Russell57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

,

Matt PalmerPrivate Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

,

Jonathan Adamczewskif32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

,

Chris NeugebauerReflecting on 10 years of not having to update WordPress

Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

How to migrate a Debian WordPress site to Jekyll

Should you find the Jekyll exporter not working on your Debian WordPress install:

  1. Use the standard WordPress export to export an XML feel of your site.
  2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
  3. Import the exported XML feed.
  4. Install the Jekyll exporter plugin.
  5. Follow the documentation and receive a Jekyll export of your site.

Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

,

Andrew RuthvenInstall Fedora CoreOS using FAI

I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

A few interesting aspects I encountered:

  1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
  2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
  3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
  4. The time to do an installation was incredibly long.
  5. The initrd image is really just running coreos-installer against the nominated device.

During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

,

Robert CollinsStrength training from home

For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

I’ve gotten most of my information from a few different youtube channels:

There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

Good luck, train well!

OpenSTEMOnline Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

The post Online Teaching first appeared on OpenSTEM Pty Ltd.

Brendan ScottCovid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

,

OpenSTEMCOVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

The post COVID-19 (of course) first appeared on OpenSTEM Pty Ltd.

,

Clinton Roylca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

,

Robert Collins2019 in the rearview

2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.

We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.

When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.

At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.

The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).

I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.

Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.

Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.

We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).

I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.

I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.

My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.

In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol squats, push ups, dead hangs and so on. My upper body strength needs to come along some before we can really go places though… and we’re probably going to max out the hamstring curl machine (at least for regular two-leg curls) before my core is strong enough to do a Nordic drop.

Lynne has been worried about injuring herself with weight lifting at the gym for some time now, but recently saw my physio – Ben Cameron at Pegasus PhysioSouth – who is excellent, and he suggested that she could have less chronic back pain if she took weights back up again. She’s recently told me that I’m allowed one ‘told you so’ about this, since she found herself in a spot where previously she would have put herself in a poor lifting position, but the weight training gave her a better option and she intuitively used it, avoiding pain. So that’s a good thing – complicated because of her bodies complicated history, but an excellent trainer and physio team are making progress.

Earlier this year she had a hell of a fright, with a regular eye checkup getting referred into a ‘you are going blind; maybe tomorrow, maybe within 10 years’ nightmare scenario. Fortunately a second opinion got a specialist who probably knows the same amount but was willing to communicate it with actual words… Lynne has a condition which diabetes (type I or II) can affect, and she has a vein that can alter state somewhat arbitrarily but will probably only degrade slowly, particularly if Lynne’s diet is managed as she has been doing.

Diet wise, Lynne also has been losing some weight but this is complicated by her chronic idiopathic pancreatitis. That’s code for ‘it keeps happening and we don’t know why’ pancreatitis. We’ve consulted a specialist in the North Island who comes highly recommended by Lynne’s GP, who said that rapid weight loss is a little known but possible cause of pancreatitis – and that fits the timelines involved. So Lynne needs to lose weight to manage the onset of type II diabetes. But not to fast, to avoid pancreatitis, which will hasten the onset of type II diabetes. Aiee. Slow but steady – she’s working with the same doctor I am for that, and a similar diet, though lower on the fats as she has no gall… bladder.

In April our kitchen waste pipe started chronically blocking, and investigation with a drain robot revealed a slump in the pipe. Ground penetrating radar reveal an anomaly under the garage… and this escalated. We’re going to have to move out of the house for a week while half the house’s carpets are lifted, grout is pumped into the foundations to tighten it all back up again – and hopefully they don’t over pump it – and then it all gets replaced. Oh, and it looks like the drive will be replaced again, to fix the slumped pipe permanently. It will be lovely when done but right now we’re facing a wall of disruption and argh.

Around September I think, we managed to have a gas poisoning scare – our gas hob was left on and triggered a fireball which fortunately only scared Lynne rather than flambéing her face. We did however not know how much exposure we’d had to the LPG, nor to partially combusted gas – which produces toxic CO as a by-product, so there was a trip into the hospital for observation with Cynthia, with Lynne opting out. Lynne and Cynthia had had plenty of the basic symptoms – headaches, dizziness and so on at the the time, but after waiting for 2 hours in the ER queue that had faded. Le sigh. The hospital, bless their cotton socks don’t have the necessary equipment to diagnose CO poisoning without a pretty invasive blood test, but still took Cynthia’s vitals using methods (manual observation and a infra-red reader) that are confounded by the carboxyhemoglobin that forms from the CO that has been inhaled. Pretty unimpressed – our GP was livid. (This is one recommended protocol). Oh, and our gas hob when we got checked out – as we were not sure if we had left it on, or it had been misbehaving, turned out to have never been safe, got decertified and the pipe cut at the regulator. So we’re cooking on a portable induction hob for now.

When we moved to Rangiora I was travelling a lot more, Christchurch itself had poorer air quality than Rangiora, and our financial base was a lot smaller. Now, Rangiora’s population has gone up nearly double (13k to 19k conservatively – and that’s ignoring the surrounds that use Rangiora as a base), we have more to work with, the air situation in Christchurch has improved massively, and even a busy years travel is less than I was doing before Cynthia came along. We’re looking at moving – we’re not sure where yet; maybe more country, maybe more city.

One lovely bright spot over the last few years has been reconnecting with friends from school, largely on Facebook – some of whom I had forgotten that I knew back at school – I had a little clique but was not very aware of the wider school population in hindsight (this was more than a little embarrassing to me, as I didn’t want to blurt out “who are you?!”) – and others whom I had not :). Some of these reconnections are just light touch person-X exists and cares somewhat – and that’s cool. One in particular has grown into a deeper friendship than we had back as schoolkids, and I am happy and grateful that that has happened.

Our cats are fat and happy. Well mostly. Baggy is fat and stressed and spraying his displeasure everywhere whenever the stress gets too much :(. Cynthia calls him Mr Widdlepants. The rest of the time he cuddles and purrs and is generally happy with life. Dibbler and Kitten-of-the-wild are relatively fine with everything.

Cynthia’s violin is coming along well. She did a small performance for her classroom (with her teacher) and wowed them. I’ve been inspired to start practising trumpet again. After 27 years of decay my skills are decidedly rusty, but they are coming along. Finding arrangements for violin + trumpet is a bit challenging, and my sight-reading-with-transposition struggles to cope, but we make do. Lynne is muttering about getting a clarinet or drum-kit and joining in.

So, 2019. Whew. I hope yours was less stressful and had as many or more bright points than ours. Onwards to 2020.

,

BlueHackersBlueHackers crowd-funding free psychology services at LCA and other conferences

BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

The post BlueHackers crowd-funding free psychology services at LCA and other conferences first appeared on BlueHackers.org.

,

Robert CollinsA Cachecash retrospective

In June 2019 I started a new role as a software engineer at a startup called Cachecash. Today is probably the last day of payroll there, and as is my usual practice, I’m going to reflect back on my time there. Less commonly, I’m going to do so in public, as we’re about to open the code (yay), and its not a mega-corporation with everything shuttered up (also yay).

Framing

This is intended to be a blameless reflection on what has transpired. Blameless doesn’t mean inaccurate; but it means placing the focus on the process and system, not on the particular actor that happened to be wearing the hat at the time a particular event happened. Sometimes the system is defined by the actors, and in that case – well, I’ll let you draw your own conclusions if you encounter that case.

A retrospective that we can’t learn from is useless. Worse than useless, because it takes time to write and time to read and that time is lost to us forever. So if a thing is a particular way, it is going to get said. Not to be mean, but because false niceness will waste everyone’s time. Mine and my ex-colleagues whose time I respect. And yours, if you are still reading this.

What was Cachecash

Cachecash was a startup – still is in a very technical sense, corporation law being what it is. But it is still a couple of code bases – and a nascent open source project (which will hopefully continue) – built to operationalise and productise this research paper that the Cachecash founders wrote.

What it isn’t anymore is a company investing significant amounts of time and money in the form of engineering in making code, to make those code bases better.

Cachecash was also a team of people. That obviously changed over time, but at the time I write this it is:

  • Ghada
  • Justin
  • Kevin
  • Marcus
  • Petar
  • Robert
  • Scott

And we’re all pretty fantastic, if you ask me :).

Technical overview

The CAPNet paper that I linked above doesn’t describe a product. What it describes is a system that permits paying caches (think squid/varnish etc) for transmitting content to clients, while also detecting attempts by such caches to claim payment when they haven’t transmitted, or attempting to collude with a client to pretend to overtransmit and get paid that way. A classic incentives-aligned scheme.

Note that there is no blockchain involved at this layer.

The blockchain was added into this core system as a way to build a federated marketplace – the idea was that the blockchain provided a suitable substrate for negotiating the purchase and sale of contracts that would be audited using the CAPNet accounting system, the payments could be micropayments back onto the blockchain, and so on – we’d avoid the regular financial system, and we wouldn’t be building a fragile central system that would prevent other companies also participating.

Miners would mine coins, publishers would buy coins then place them in escrow as a promise to pay caches to deliver content to clients, and a client would deliver proof of delivery back to the cache which would then claim payment from the publisher.

Technical Challenges

There were a few things that turned up as significant issues. In no particular order:

The protocol

The protocol itself adds additional round trips to multiple peers – in its ‘normal’ configuration the client ends up running (web- for browers) GRPC connections to 5 endpoints (with all the normal windowing concerns, but potentially over QUIC), and then gets chunks of content in batches (concurrently) from 4 of the peers, runs a small crypto brute force operation on the combined result, and then moves onto the next group of content. This should be sounding suspiciously like TCP – it is basically a window management problem, and it has exactly the same performance management problems – fast start, maximum window size, how far to reduce it when problems are suffered. But accentuated: those 4 cache peers can all suffer their own independent noise problems, or be hostile. But also, they can also suffer correlated problems: they might all be in the same datacentre, or be all run by a hostile actor, or the client might be on a hostile WiFi link, or the client’s OS/browser might be hostile. Lets just say that there is a long, rich road for optimising this new protocol to make it fast, robust, reliable. Much as we have taken many years to make HTTP into QUIC, drawing upon techniques like forward error correction rather than retries – similar techniques will need to be applied to give this protocol similar performance characteristics. And evolving the protocol while maintaining the security properties is a complicated task, with three actors involved, who may collude in various ways.

An early performance analysis I did on the go code implementation showed that the brute forcing work was a bottleneck because while the time (once optimise) per second was entirely modest for any small amount of data, the delay added per window element acts as a brake on performance for high capacity low latency links. For a 1Gbps 25ms RTT link I estimated a need for 8 cores doing crypto brute forcing on the client.

JS

Cachecash is essentially implementing a new network protocol. There are some great hooks these days in browsers, and one can hook in and provide streams to things like video players to let them get one segment of video. However, for downloading an entire file – for instance, if one is downloading a full video, it is not so easy. This bug, open for 2 years now, is the standards based way to do it. Even so non-standards based way to do it involves buffering the entire content in memory, oh and reflecting everything through a static github service worker. (You of course host such a static page yourself, but then the whole idea of this federated distributed system breaks down a little).

Our initial JS implementation was getting under 512KBps with all-local servers – part of that was the bandwidth delay product issue mentioned above. Moving to getting chunks of content from each cache concurrently using futures improved that up to 512KBps, but thats still shocking for a system we want to be able to compete with the likes of Youtube, Cloudflare and Akamai.

One of the hot spots turned out to be calculating SHA-256 values – the CAPNet algorithm calculates thousands (it’s tunable, but 8k in the set I was analysing) of independent SHA’s per chunk of received data. This is a problem – in browser SHA routines, even the recent native hosted ones – are slow per SHA. They are not slow per byte. Most folk want to make a small number of SHA calculations. Maybe thousands in total. Not tens of thousands per MB of data received….. So we wrote an implementation of the core crypto routines in Rust WASM, which took our performance locally up to 2MBps in Firefox and 6MBps in Chromium.

It is also possible we’d show up as crypto-JS at that point and be blacklisted as malware!

Blockchain

Having chosen to involve a block chain in the stack we had to deal with that complexity. We chose to take bitcoin’s good bits and run with those rather than either running a sidechain, trying to fit new transaction types into bitcoin itself, or trying to shoehorn our particular model into e.g. Ethereum. This turned out to be a fairly large amount of work : not the core chain itself – cloning the parts of bitcoin that we wanted was very quick. But then layering on the changes that we needed, to start dealing with escrows and negotiating parameters between components and so forth. And some of the operational challenges below turned up here as well even just in developer test setups (in particular endpoint discovery).

Operational Challenges

The operational model was pretty interesting. The basic idea was that eventually there would be this big distributed system, a bit-coin like set of miners etc, and we’d be one actor in that ecosystem running some subset of the components, but that until then we’d be running:

  • A centralised ledger
  • Centralised random number generation for the micropayment system
  • Centralised deployment and operations for the cache fleet
  • Software update / vetting for the publisher fleet
  • Software update / publishing for the JS library
  • Some number of seed caches
  • Demo publishers to show things worked
  • Metrics, traces, chain explorer, centralised logging

We had most of this live and running in some fashion for most of the time I was there – we evolved it and improved it a number of times as we iterated on things. Where appropriate we chose open source components like Jaeger, Prometheus and Elasticsearch. We also added policy layers on top of them to provide rate limiting and anti-spoofing facilities. We deployed stuff in AWS, with EKS, and there were glitches and things to workaround but generally only a tiny amount of time went into that part of it. I think I spent a day on actual operations a month, or thereabouts.

Other parties were then expected to bring along additional caches to expand the network, additional publishers to expand the content accessible via the network, and clients to use the network.

Ensuring a process run by a third party is network reachable by a browser over HTTPS is a surprisingly non-simple problem. We partly simplified it by mandating that they run a docker container that we supplied, but there’s still the chance that they are running behind a firewall with asymmetric ingress. And after that we still need a domain name for their endpoint. You can give every cache a CNAME in a dedicated subdomain – say using their public key as the subdomain, so that only that cache can issue requests to update their endpoint information in DNS. It is all solvable, but doing it so that the amount of customer interaction and handholding is reduced to the bare minimum is important: a user with a fleet of 1000 machines doesn’t want to talk to us 1000 times, and we don’t want to talk to them either. But this was a another bit of this-isn’t-really-distributed-is-it grit in the distributed-ointment.

Adoption Challenges

ISPs with large fleets of machines are in principle happy to sell capacity on them in return for money – yay. But we have no revenue stream at the moment, so they aren’t really incentivised to put effort in, it becomes a matter of principle, not a fiscal “this is 10x better for my business” imperative. And right now, its 10x slower than HTTP. Or more.

Content owners with large amounts of content being delivered without a CDN would like a radically cheaper CDN. Except – we’re not actually radically cheaper on a cost structure basis. Current CDN’s are expensive for their expensive 2nd and third generation products because no-one offers what they offer – seamless in-request edge computing. But that ISP that is contributing a cache to the fleet is going to want the cache paid for, and thats the same cost structure as existing CDNs – who often have a free entry tier. We might have been able to make our network cheaper eventually, but I’m just not sure about the radically cheaper bit.

Content owners who would like a CDN marketplace where the CDN caches are competing with each other – driving costs down – rather than than the CDN operators competing – would absolutely love us. But I rather suspect that those owners want more sophisticated offerings. To be clear, I wasn’t on the customer development team, and didn’t get much in the way of customer development briefings. But things like edge computing workers, where completely custom code can run in the CDN network, adjacent to ones user, are much more powerful offerings than simple static content shipping offerings, and offered by all major CDN’s. These are trusted services – the CAPNet paper doesn’t solve the problem of running edge code and providing proof that it was run. Enarx might go some, or even a long way way to running such code in an untrusted context, but providing a proof that it was run – so that running it can become a mining or mining-like operation is a whole other question. Without such an answer, an edge computing network starts to depend on trusting the caches behaviour a lot more all over again – the network has no proof of execution to depend on.

Rapid adjustment – load spikes – is another possible use case, but the use of the blockchain to negotiate escrows actually seemed to work against our ability to offer that. Akami define load spike in a time frame faster than many block chains can decide that a transaction has actually been accepted. Offchain transactions are of course a known thing in the block chain space but again that becomes additional engineering.

Our use of a new network protocol – for all that it was layered on standard web technology – made it harder for potential content owners to adopt our technology. Rather than “we have 200 local proxies that will deliver content to your users, just generate a url of the form X.Y.Z”, our solution is “we do not trust the 200 local proxies that we have, so you need to run complicated JS in your browser/phone app etc” to verify that the proxies are actually doing their job. This is better in some ways – precisely because we don’t trust those proxies, but it also increases both the runtime cost of using the service, the integration cost adopting the service, and complexity of debugging issues receiving content via the service.

What did we learn?

It is said that “A startup is an organization formed to search for a repeatable and scalable business model.” What did we uncover in our search? What can we take away going forward?

In principle we have a classic two sided market – people with excess capacity close to users want to sell it, and people with excess demand for their content want to buy delivery capacity.

The baseline market is saturated. The market as a whole is on its third or perhaps fourth (depending on how you define things) major iteration of functionality.

Content delivery purchasers are ok with trusting their suppliers : any supply chain fraud happening in this space at the moment is so small no-one is talking about it that I heard about.

Some of the things we were doing don’t seem to have been important to the customers we talked to – I don’t have a great read on this, but in particular, the blockchain aspect seems to have been more important to our long term vision than to the 2-sided market place that we perceived. It would be fascinating to me to validate that somehow – would cache capacity suppliers be willing to trust us enough to sell capacity to us with just the auditing mechanism, without the blockchain? Would content providers be happy buying credit from us rather than from a neutral exchange?

What did I learn?

I think in hindsight my startup muscles were atrophied – it had been some years since Canonical and it took a few months to start really thinking lean-startup again on a personal basis. That’s ok, because I was hired to build systems. But its not great, because I can do better. So number one: think lean-startup and really step up to help with learning and validation.

I levelled up my Go lang skills. That was really nice – Kevin has deep knowledge there, and though I’ve written Go before I didn’t have a good appreciation for style or aesthetics, or why. I do now. Where before I’d say ‘I’m happy to dive in but its not a language I feel I really know’, I am now happy to say that I know Go. More to learn – there always is – but in a good place.

I did a similar thing to my JS skills, but not to the same degree. Having climbed fairly deeply into the JS client – which is written in Typescript, converted its bundling system to webpack to work better with Rust-WASM, and so on. Its still not my go-to place, but I’m much more comfortable there now.

And of course playing with Rust-WASM was pure delight. Markus and I are both Rust afficionados, and having a genuine reason to write some Rust code for work was just delightful. Finding this bug was just a bonus :).

It was also really really nice being back in a truely individual contributor role for a while. I really enjoyed being able to just fix bugs and get on with things while I got my bearings. I’ve ended up doing a bit more leadership – refining of requirements, translating between idea-and-specification and the like recently, but still about 80% of time has been able to be sit-down-and-code, and that really is a pleasant holiday.

What am I going to change?

I’m certainly going to get a new job :). If you’re hiring, hit me up. (If you don’t have my details already, linkedin is probably best).

I’m think there the core thing I need to do is more alignment of the day to day work I’m doing with needs of customer development : I don’t want to take on or take over the customer development role – that will often be done best in person with a customer for startups, and I’m happy remote – but the more I can connect what I’m trying to achieve with what will get the customers to pay us, the more successful any business I’m working in will be. This may be a case for non-vanity metrics, or talking more with the customer-development team, or – well, I don’t know exactly what it will look like until I see the context I end up in, but I think more connection will be important.

And I think the second major thing is to find a better balance between individual contribution and leadership. I love individual contribution, it is perhaps the least stressful and most Zen place to be. But it is also the least effective unless the project has exactly one team member. My most impactful and successful roles have been leadership roles, but the pure leadership role with no individual contribution slowly killed me inside. Pure individual contribution has been like I imagine crack to be, and perhaps just as toxic in the long term.

,

Robert CollinsRust and distributions

Daniel wrote a lovely blog post about Rust’s ability to be included in distributions, both as a language that you can get via the distribution, and as the language that components of the distribution are being written in.

I think this is a great goal to raise and I have just a few thoughts and quibbles. First I want to acknowledge and agree with him on the Rust community, its so very nice, and he is doing a great thing as rustup lead; I wish I had more time to put in, I have more things I want to contribute to rustup. I’ll try to get back to the meetings soon.

On trust

I completely agree about the need for the crates index improvement : without those we cannot have a mirror network, and thats a significant issue for offline users and slow-region users.

On curlsh though

It isn’t the worst possible thing, for all that its “untrusted bootstrapping”, the actual thing downloaded is https secured etc, and so is the rustup binary itself. Put another way, I think the horror is more perceptual than analyzed risk. Someone that trusts Verisign etc enough to download the Debian installer enough over it, has exactly the same risk as someone trusting Verisign enough to download rustup at that point in time.

Cross signing curlsh that with per-distro keys or something seems pretty ridiculous to me, since the root of trust is still that first download; unless you’re wandering up to someone who has bootstrapped their compiler by hand (to avoid reflections-on-trust attacks), to get an installer, to build a system, to then do reproducible builds, to check that other systems are actually safe… aieeee.

I think its easier to package the curl|sh shell script in Debian itself perhaps? apt install get-rustup; then if / when rustup becomes packaged the user instructions don’t change but the root of trust would, as get-rustup would be updated to not download rustup, but to trigger a different package install, and so forth.

I don’t think its desirable though, to have distribution forks of the contents that rustup manages – Debian+Redhat+Suse+… builds of nightly rust with all the things failing or not, and so on – I don’t see who that would help. And if we don’t have that then the root of trust would still not be shifted under the GPG keychain – it would still be the HTTPS infrastructure for downloading rust toolchains + the integrity of the rustup toolchain builds themselves. Making rustup, which currently shares that trust, have a different trust root, seems pointless.

On duplication of dependencies

I think Debian needs to become more inclusive here, not Rustup. Debian has spent; pauses, counts, yes, DECADES, rejecting multiple entire ecosystems because of a prejuidiced view about what the Right Way to manage dependencies is. And they are not right in a universal sense. They were right in an engineering sense: given constraints (builds are expensive, bandwidth is expensive, disk is expensive), they are right. But those are not universal constraints, and seeking to impose those constraints on Java and Node – its been an unmitigated disaster. It hasn’t made those upstreams better, or more secure, or systematically fixed problems for users. I have another post on this so rather than repeating I’m going to stop here :).

I think Rust has – like those languages – made the crucial, maintainer and engineering efficiency important choice to embrace enabling incremental change across libraries, with the consequence that dependencies don’t shift atomically, and sure, this is basically incompatible with Debian packaging world view which says that point and patch releases of libraries are not distinct packages, and thus the shared libs for these things all coexist in the same file on disk. Boom! Crash!

I assert that it is entirely possible to come up with a reasonable design for managing a respository of software that doesn’t make this conflation, would allow actual point and patch releases of exist as they are for the languages that have this characteristic, and be amenable to automation, auditing and reporting for security issues. E.g. Modernise Debian to cope with this fundamentally different language design decision… which would make Java and Node and Rust work so very much better.

Alternatively, if Debian doesn’t want to make it possible to natively support languages that have made this choice, Debian could:

  • ship static-but-for-system-libs builds
  • not include things written in rust
  • ask things written in rust to converge their dependencies again and again and again (and only update them when the transitive dependencies across the entire distro have converged)

I have a horrible suspicion about which Debian will choose to do :(. The blinkers / echo chamber are so very strong in that community.

For Windows

We got to parity with Linux for IO for non-McAfee users, but I guess there are a lot of them out there; we probably need to keep pushing on tweaking it until it work better for them too; perhaps autodetect McAfee and switch to minimal? I agree that making Windows users – like I am these days – feel tier one, would be nice :). Maybe a survey of user experience would be a good starting point.

Shared libraries

Perhaps generating versioned symbols automatically and building many versions of the crate and then munging them together? But I’d also like to point here again that the whole focus on shared libraries is a bit of a distribution blind spot, and looking at the vast amount of distribution of software occuring in app stores and their model, suggests different ways of dealing with these things. See also the fairly specific suggestion I make about the packaging system in Debian that is the root of the problem in my entirely humble view.

Bonus

John Goerzen posted an entirely different thing recently, but in it he discusses programs that don’t properly honour terminfo. Sadly I happen to know that large chunks of the Rust ecosystem assume that everything is ANSI these days, and it certainly sounds like, at least for John, that isn’t true. So thats another way in which Rust could be more inclusive – use these things that have been built, rather than being modern and new age and reinventing the 95% match.

,

Julien GoodwinSome thoughts on Storytelling as an engineering teaching tool

Every week at work on Wednesday afternoons we have the SRE ops review, a relaxed two hour affair where SREs (& friends of, not all of whom are engineers) share interesting tidbits that have happened over the last week or so, this might be a great success, an outage, a weird case, or even a thorny unsolved problem. Usually these relate to a service the speaker is oncall for, or perhaps a dependency or customer service, but we also discuss major incidents both internal & external. Sometimes a recent issue will remind one of the old-guard (of which I am very much now a part) of a grand old story and we share those too.

Often the discussion continues well into the evening as we decant to one of the local pubs for dinner & beer, sometimes chatting away until closing time (probably quite regularly actually, but I'm normally long gone).

It was at one of these nights at the pub two months ago (sorry!), that we ended up chatting about storytelling as a teaching tool, and a colleague asked an excellent question, that at the time I didn't have a ready answer for, but I've been slowly pondering, and decided to focus on over an upcoming trip.

As I start to write the first draft of this post I've just settled in for cruise on my first international trip in over six months[1], popping over to Singapore for the Melbourne Cup weekend, and whilst I'd intended this to be a holiday, I'm so terrible at actually having a holiday[2] that I've ended up booking two sessions of storytelling time, where I present the history of Google's production networks (for those of you reading this who are current of former engineering Googlers, similar to Traffic 101). It's with this perspective of planning, and having run those sessions that I'm going to try and answer the question that I was asked.

Or at least, I'm going to split up the question I was asked and answer each part.

"What makes storytelling good"

On its own this is hard to answer, there are aspects that can help, such as good presentation skills (ideally keeping to spoken word, but simple graphs, diagrams & possibly photos can help), but a good story can be told in a dry technical monotone and still be a good story. That said, as with the rest of these items charisma helps.

"What makes storytelling interesting"

In short, a hook or connection to the audience, for a lot of my infrastructure related outage stories I have enough context with the audience to be able to tie the impact back in a way that resonates with a person. For larger disparate groups shared languages & context help ensure that I'm not just explaining to one person.

In these recent sessions one was with a group of people who work in our Singapore data centre, in that session I focused primarily on the history & evolution of our data centre fabrics, giving them context to understand why some of the (at face level) stranger design decisions have been made that way.

The second session was primarily people involved in the deployment side of our backbone networks, and so I focused more on the backbones, again linking with knowledge the group already had.

"What makes storytelling entertaining"

Entertaining storytelling is a matter of style, skills and charisma, and while many people can prepare (possibly with help) an entertaining talk, the ability to tell an entertaining story off the cuff is more of a skill, luckily for me, one I seem to do ok with. Two things that can work well are dropping in surprises, and where relevant some level of self-deprecation, however both need to be done very carefully.

Surprises can work very well when telling a story chronologically "I assumed X because Y, <five minutes of waffling>, so it turned out I hadn't proved Y like I thought, so it wasn't X, it was Z", they can help the audience to understand why a problem wasn't solved so easily, and explaining "traps for young players" as Dave Jones (of the EEVblog) likes to say can themselves be really helpful learning elements. Dropping surprises that weren't surprises to the story's protagonist generally only works if it's as a punchline of a joke, and even then it often doesn't.

Self-deprecation is an element that I've often used in the past, however more recently I've called others out on using it, and have been trying to reduce it myself, depending on the audience you might appear as a bumbling success or stupid, when the reality may be that nobody understood the situation properly, even if someone should have. In the ops review style of storytelling, it can also lead to a less experienced audience feeling much less confident in general than they should, which itself can harm productivity and careers.

If the audience already had relevant experience (presenting a classic SRE issue to other SREs for example, a network issue to network engineers, etc.) then audience interaction can work very well for engagement. "So the latency graph for database queries was going up and to the right, what would you look at?" This is also similar to one of the ways to run a "wheel of misfortune" outage simulation.

"What makes storytelling useful & informative at the same time"

In the same way as interest, to make storytelling useful & informative for the audience involves consideration for the audience, as a presenter if you know the audience, at least in broad strokes this helps. As I mentioned above, when I presented my talk to a group of datacenter-focused people I focused on the DC elements, connecting history to the current incarnations; when I presented to a group of more general networking folk a few days later, I focused more on the backbones and other elements they'd encountered.

Don't assume that a story will stick wholesale, just leaving a few keywords, or even just a vague memory with a few key words they can go digging for can make all the difference in the world. Repetition works too, sharing many interesting stories that share the same moral (for an example, one of the ops review classics is demonstrations about how lack of exponential backoff can make recovery from outages hard), hearing this over dozens of different stories over weeks (or months, or years...) it eventually seeps in as something to not even question having been demonstrated as such an obvious foundation of good systems.

When I'm speaking to an internal audience I'm happy if they simply remember that I (or my team) exist and might be worth reaching out to in future if they have questions.

Lastly, storytelling is a skill you need to practice, whether a keynote presentation in front of a few thousand people, or just telling tall takes to some mates at the pub practice helps, and eventually many of the elements I've mentioned above become almost automatic. As can probably be seen from this post I could do with some more practice on the written side.

1: As I write these words I'm aboard a Qantas A380 (QF1) flying towards Singapore, the book I'm currently reading, of all things about mechanical precision ("Exactly: How Precision Engineers Created the Modern World" or as it has been retitled for paperback "The Perfectionists"), has a chapter themed around QF32, the Qantas A380 that notoriously had to return to Singapore after an uncontained engine failure. Both the ATSB report on the incident and the captain Richard de Crespigny's book QF32 are worth reading. I remember I burned though QF32 one (very early) morning when I was stuck in GlobalSwitch Sydney waiting for approval to repatch a fibre, one of the few times I've actually dealt with the physical side of Google's production networks, and to date the only time the fact I live just a block from that facility has been used at all sensibly.

2: To date, I don't think I've ever actually had a holiday that wasn't organised by family, or attached to some conference, event or work travel I'm attending. This trip is probably the closest I've ever managed (roughly equal to my burnout trip to Hawaii in 2014), and even then I've ruined it by turning two of the three weekdays into work. I'm much better at taking breaks that simply involve not leaving home or popping back to stay with family in Melbourne.

,

Tim SerongNetwork Maintenance

To my intense amazement, it seems that NBN Co have finally done sufficient capacity expansion on our local fixed wireless tower to actually resolve the evening congestion issues we’ve been having for the past couple of years. Where previously we’d been getting 22-23Mbps during the day and more like 2-3Mbps (or worse) during the evenings, we’re now back to 22-23Mbps all the time, and the status lights on the NTD remain a pleasing green, rather than alternating between green and amber. This is how things were way back at the start, six years ago.

We received an email from iiNet in early July advising us of the pending improvements. It said:

Your NBN™ Wireless service offers maximum internet speeds of 25Mbps downland and 5Mbps upload.

NBN Co have identified that your service is connected to a Wireless cell that is currently experiencing congestion, with estimated typical evening speeds of 3~6 Mbps. This congestion means that activities like browsing, streaming or gaming might have been and could continue to be slower than promised, especially when multiple people or devices are using the internet at the same time.

NBN Co estimates that capacity upgrades to improve the speed congestion will be completed by Dec-19.

At the time we were given the option of moving to a lower speed plan with a $10 refund because we weren’t getting the advertised speed, or to wait it out on our current plan. We chose the latter, because if we’d downgraded, that would have reduced our speed during the day, when everything was otherwise fine.

We did not receive any notification from iiNet of exactly when works would commence, nor was I ever able to find any indication of planned maintenance on iiNet’s status page. Instead, I’ve come to rely on notifications from my neighbour, who’s with activ8me. He receives helpful emails like this:

This is a courtesy email from Activ8me, Letting you know NBN will be performing Fixed Wireless Network capacity work in your area that might affect your connectivity to the internet. This activity is critical to the maintenance and optimisation of the network. The approximate dates of this maintenance/upgrade work will be:

Impacted location: Neika, TAS & Downstream Sites & Upstream Sites
NBN estimates interruption 1 (Listed Below) will occur between:
Start: 24/09/19 7:00AM End: 24/09/19 8:00PM
NBN estimates interruption 2 (Listed Below) will occur between:
Start: 25/09/19 7:00AM End: 25/09/19 8:00PM
NBN estimates interruption 3 (Listed Below) will occur between:
Start: 01/10/19 7:00AM End: 01/10/19 8:00PM
NBN estimates interruption 4 (Listed Below) will occur between:
Start: 02/10/19 7:00AM End: 02/10/19 8:00PM
NBN estimates interruption 5 (Listed Below) will occur between:
Start: 03/10/19 7:00AM End: 03/10/19 8:00PM
NBN estimates interruption 6 (Listed Below) will occur between:
Start: 04/10/19 7:00AM End: 04/10/19 8:00PM
NBN estimates interruption 7 (Listed Below) will occur between:
Start: 05/10/19 7:00AM End: 05/10/19 8:00PM
NBN estimates interruption 8 (Listed Below) will occur between:
Start: 06/10/19 7:00AM End: 06/10/19 8:00PM

Change start
24/09/2019 07:00 Australian Eastern Standard Time

Change end
06/10/2019 20:00 Australian Eastern Daylight Time

This is expected to improve your service with us however, occasional loss of internet connectivity may be experienced during the maintenance/upgrade work.
Please note that the upgrades are performed by NBN Co and Activ8me has no control over them.
Thank you for your understanding in this matter, and your patience for if it does affect your service. We appreciate it.

The astute observer will note that this is pretty close to two weeks of scheduled maintenance. Sure enough, my neighbour and I (and presumably everyone else in the area) enjoyed major outages almost every weekday during that period, which is not ideal when you work from home. But, like I said at the start, they did finally get the job done.

Interestingly, according to activ8me, there is yet more NBN maintenance scheduled from 21 October 07:00 ’til 27 October 21:00, then again from 28 October 07:00 ’til 3 November 21:00 (i.e. another two whole weeks). The only scheduled upgrade I could find listed on iiNet’s status page is CM-177373, starting “in 13 days” with a duration of 6 hours, so possibly not the same thing.

Based on the above, I am convinced that there is some problem with iiNet’s status page not correctly reporting NBN incidents, but of course I have no idea whether this is NBN Co not telling iiNet, iiNet not listening to NBN Co, or if it’s just that the status web page is busted.

,

Robert CollinsWant me to work with you?

Reach out to me – I’m currently looking for something interesting to do. https://www.linkedin.com/in/rbtcollins/ and https://twitter.com/rbtcollins are good ways to grab me if you don’t already have my details.

Should you reach out to me? Maybe :). First, a little retrospective.

Three years ago, I wrote the following when reflecting on what I wanted to be doing:

Priorities (roughly ordered most to least important):

  • Keep living in Rangiora (family)
  • Up to moderate travel requirements – 4 trips a year + LCA/PyCon
  • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
  • Be doing something that matters
    • -> Being open source is one way to this, but not the only one
  • Something cutting edge would be awesome
    • -> Rust / Haskell / High performance requirements / scale / ….
  • Salary

How well did that work for me? Pretty good. I had a good satisfying job at VMware for 3 years, met some wonderful people, achieved some very cool things. And those priorities above were broadly achieved.
The one niggle that stands out was this – Did the things we were doing matter? Certainly there was no social impact – VMware isn’t a non-profit, being right at the core of capitalism as it is. There was direct connection and impact with the team, the staff we worked with and the users of the products… but it is just a bit hard to feel really connected through that though: VMware is a very large company and there are many layers between users and developers.

We were quite early adopters of Kubernetes, which allowed me to deepen my Go knowledge and experience some more fun with AWS scale operations. I had many interesting discussions about the relative strengths of Python Go and Rust and Java with colleagues there. (Hi Geoffrey).

Company culture is very important to me, and VMware has a fantastically supportive culture. One of the most supportive companies I’ve been in, bar none. It isn’t a truely remote-organised company though: rather its a bunch of offices that talk to each other, which I think is sad. True remote-first offers so much more engagement.

I enjoy building things to solve problems. I’ve either directly built, or shaped what is built, in all my most impactful and successful roles. Solving a problem once by hand is fine; solving it for years to come by creating a tool is far more powerful.

I seem to veer into toolmaking very often: giving other people the ability to solve their problems takes the power of a tool and multiplies it even further.

It should be no surprise then that I very much enjoy reading white papers like the original Dapper and Map-reduce ones, LinkedIn’s Kafka or for more recent fodder the Facebook Akkio paper. Excellent synthesis and toolmaking applied at industrial scale. I read those things and I want to be a part of the creation of those sorts of systems.

I was fortunate enough to take some time to go back to university part-time, which though logistically challenging is something I want to see through.

Thus I think my new roughly ordered (descending) list of priorities needs to be something like this:

  • Keep living in Rangiora (family)
  • Up to moderate travel requirements – 4 team-meeting trips a year + 2 conferences
  • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
  • Be doing something that matters
    • Be working directly on a problem / system that has problems
  • Something cutting edge would be awesome
    • Rust / Haskell / High performance requirements / scale / ….
  • A generative (Westrum definition) + supportive company culture
  • Remote-first or at least very remote familiar environment
  • Support my part time study / self improvement initiative
  • Salary

,

Clinton RoyRestricted Sleep Regime

Since moving down to Melbourne my poor sleep has started up again. It’s really hard to say what the main factor driving this is. My doctor down here has put me onto a drug free way of trying to improve my sleep, and I think I kind of like it, while it’s no silver bullet, it is something I can go back to if I’m having trouble with my sleep, without having to get a prescription.

The basic idea is to maximise sleep efficiency. If you’re only getting n hours sleep a night, only spend n hours  a night in bed. This forces you to stay up and go to bed rather late for a few nights. Hopefully, being tired will help you sleep through the night in one large segment. Once you’ve successfully slept through the night a few times, relax your bed time by say fifteen minutes, and get used to that. Slowly over time, you increase the amount of sleep you’re getting, while keeping your efficiency high.

,

OpenSTEMElection Activity Bundle

With the upcoming federal election, many teachers want to do some related activities in class – and we have the materials ready for you! To make selecting suitable resources a bit easier, we have an Election Activity Bundle containing everything you need, available for just $9.90. Did you know that the secret ballot is an Australian […]

The post Election Activity Bundle first appeared on OpenSTEM Pty Ltd.

,

Julien GoodwinBuilding new pods for the Spectracom 8140 using modern components

I've mentioned a bunch of times on the time-nuts list that I'm quite fond of the Spectracom 8140 system for frequency distribution. For those not familiar with it, it's simply running a 10MHz signal against a 12v DC power feed so that line-powered pods can tap off the reference frequency and use it as an input to either a buffer (10MHz output pods), decimation logic (1MHz, 100kHz etc.), or a full synthesizer (Versa-pods).

It was only in October last year that I got a house frequency standard going using an old Efratom FRK-LN which now provides the reference; I'd use a GPSDO, but I live in a ground floor apartment without a usable sky view, this of course makes it hard to test some of the GPS projects I'm doing. Despite living in a tiny apartment I have test equipment in two main places, so the 8140 is a great solution to allow me to lock all of them to the house standard.


(The rubidium is in the chunky aluminium chassis underneath the 8140)

Another benefit of the 8140 is that many modern pieces of equipment (such as my [HP/Agilent/]Keysight oscilloscope) have a single connector for reference frequency in/out, and should the external frequency ever go away it will switch back to its internal reference, but also send that back out the connector, which could lead to other devices sharing the same signal switching to it. The easy way to avoid that is to use a dedicated port from a distribution amplifier for each device like this, which works well enough until you have this situation in multiple locations.

As previously mentioned the 8140 system uses pods to add outputs, while these pods are still available quite cheaply used on eBay (as of this writing, for as low as US$8, but ~US$25/pod has been common for a while), recently the cost of shipping to Australia has gone up to the point I started to plan making my own.

By making my own pods I also get to add features that the original pods didn't have[1], I started with a quad-output pod with optional internal line termination. This allows me to have feeds for multiple devices with the annoying behaviour I mentioned earlier. The enclosure is a Pomona model 4656, with the board designed to slot in, and offer pads for the BNC pins to solder to for easy assembly.



This pod uses a Linear Technologies (now Analog Devices) LTC6957 buffer for the input stage replacing a discrete transistor & logic gate combined input stage in the original devices. The most notable change is that this stage works reliably down to -30dBm input (possibly further, couldn't test beyond that), whereas the original pods stop working right around -20dBm.

As it turns out, although it can handle lower input signal levels, in other ways including power usage it seems very similar. One notable downside is the chip tops out at 4v absolute maximum input, so a separate regulator is used just to feed this chip. The main regulator has also been changed from a 7805 to an LD1117 variant.

On this version the output stage is the same TI 74S140 dual 4-input NAND gate as was used on the original pods, just in SOIC form factor.

As with the next board there is one error on the board, the wire loop that forms the ground connection was intended to fit a U-type pin header, however the footprint I used on the boards was just too tight to allow the pins through, so I've used some thin bus wire instead.



The second major variant I designed was a combo version, allowing sine & square outputs by just switching a jumper, or isolated[2] or line-regenerator (8040TA from Spectracom) versions with a simple sub-board containing just an inductor (TA) or 1:1 transformer (isolated).



This is the second revision of that board, where the 74S140 has been replaced by a modern TI 74LVC1G17 buffer. This version of the pod, set for sine output, uses almost exactly 30mA of current (since both the old & new pods use linear supplies that's the most sensible unit), whereas the original pods are right around 33mA. The empty pods at the bottom-left are simply placeholders for 2 100 ohm resistors to add 50 ohm line termination if desired.

The board fits into the Pomona 2390 "Size A" enclosures, or for the isolated version the Pomona 3239 "Size B". This is the reason the BNC connectors have to be extended to reach the board, on the isolated boxes the BNC pins reach much deeper into the enclosure.

If the jumpers were removed, plus the smaller buffer it should be easy to fit a pod into the Pomona "Miniature" boxes too.



I was also due to create some new personal businesscards, so I arranged the circuit down to a single layer (the only jumper is the requirement to connect both ground pins on the connectors) and merged it with some text converted to KiCad footprints to make a nice card on some 0.6mm PCBs. The paper on that photo is covering the link to the build instructions, which weren't written at the time (they're *mostly* done now, I may update this post with the link later).

Finally, while I was out travelling at the start of April my new (to me) HP 4395A arrived so I've finally got some spectrum output. The output is very similar between the original and my version, with the major notable difference being that my version is 10dB worse at the third harmonic. I lack the equipment (and understanding) to properly measure phase noise, but if anyone in AU/NZ wants to volunteer their time & equipment for an afternoon I'd love an excuse for a field trip.



Spectrum with input sourced from my house rubidium (natively a 5MHz unit) via my 8140 line. Note that despite saying "ExtRef" the analyzer is synced to its internal 10811 (which is an optional unit, and uses an external jumper, hence the display note.



Spectrum with input sourced from the analyzer's own 10811, and power from the DC bias generator also from the analyzer.


1: Or at least I didn't think they had, I've since found out that there was a multi output pod, and one is currently in the post heading to me.
2: An option on the standard Spectracom pods, albeit a rare one.

,

Jonathan AdamczewskiNavigation Mesh and Sunset Overdrive

Navigation mesh encodes where in the game world an agent can stand, and where it can go. (here “agent” means bot, actor, enemy, NPC, etc)

At runtime, the main thing navigation mesh is used for is to find paths between points using an algorithm like A*: https://en.wikipedia.org/wiki/A*_search_algorithm

In Insomniac’s engine, navigation mesh is made of triangles. Triangle edge midpoints define a connected graph for pathfinding purposes.

In addition to triangles, we have off-mesh links (“Custom Nav Clues” in Insomniac parlance) that describe movement that isn’t across the ground. These are used to represent any kind of off-mesh connection – could be jumping over a car or railing, climbing up to a rooftop, climbing down a ladder, etc. Exactly what it means for a particular type of bot is handled by clue markup and game code.

These links are placed by artists and designers in the game environment, and included in prefabs for commonly used bot-traversable objects in the world, like railings and cars.

Navigation mesh makes a certain operations much, much simpler than it would be if done by trying to reason about render or physics geometry.

Our game work is made up of a lot of small objects, which are each typically made from many triangles.

Using render or physics geometry to answer the question “can this bot stand here” hundreds of times every frame is not scalable. (Sunset Overdrive had 33ms frames. That’s not a lot of time.)

It’s much faster to ask: is there navigation mesh where this bot is

Navigation mesh is relatively sparse and simple, so the question can be answered quickly. We pre-compute bounding volumes for navmesh, to make answering that question even faster, and if a bot was standing on navmesh last frame, it’s even less work to reason about where they are this frame.

In addition to path-finding, navmesh can be useful to quickly and safely limit movement in a single direction. We sweep lines across navmesh to find boundaries to clamp bot movement. For example, a bot animating through a somersault will have its movement through the world clamped to the edge of navmesh, rather than rolling off into who-knows-what.

(If you’re making a game where you want bots to be able to freely somersault in any direction, you can ignore the navmesh ðŸ˜�)

Building navmesh requires a complete view of the static world. The generated mesh is only correct when it accounts for all objects: interactions between objects affect the generated mesh in ways that are not easy (or fast) to reason about independently.

Intersecting objects can become obstructions to movement. Or they can form new surfaces that an agent can stand upon. You can’t really tell what it means to an agent until you mash it all together.

To do as little work as possible at runtime, we required *all* of the static objects to be loaded at one time to pre-build mesh for Sunset City.

We keep that pre-built navmesh loading during the game at all times. For the final version of the game (with both of the areas added via DLC) this required ~55MB memory.

We use Recast https://github.com/recastnavigation/recastnavigation to generate the triangle mesh, and (mostly for historical reasons) repack this into our own custom format.

Sunset Overdrive had two meshes: one for “normal” humanoid-sized bots (2m tall, 0.5m radius)

and one for “large” bots (4.5m tall, 1.35m radius)

Both meshes are generated as 16x16m tiles, and use a cell size of 0.125m when rasterizing collision geometry.

There were a few tools used in Sunset Overdrive to add some sense of dynamism to the static environment:

For pathfinding and bot-steering, we have runtime systems to control bot movement around dynamic obstacles.

For custom nav clues, we keep track of whether they are in use, to make it less likely that multiple bots are jumping over the same thing at the same time. This can help fan-out groups of bots, forcing them to take distinctly different paths.

Since Sunset Overdrive, we’ve added a dynamic obstruction system based on Detour https://github.com/recastnavigation/recastnavigation to temporarily cut holes in navmesh for larger impermanent obstacles like stopped cars or temporary structures.

We also have a way to mark-up areas of navmesh so that they can be toggled in a controlled fashion from script. It’s less flexible than the dyanamic obstruction system – but it is very fast: toggling flags for tris rather than retriangulation.

I spoke about Sunset Overdrive at the AI Summit a few years back – my slide deck is here:
Sunset City Express: Improving the NavMesh Pipeline in Sunset Overdrive

I can also highly recommend @AdamNoonchester‘s talk from GDC 2015:
AI in the Awesomepocalypse – Creating the Enemies of Sunset Overdrive

Here’s some navigation mesh, using the default in-engine debug draw (click for larger version)

What are we looking at? This is a top-down orthographic view of a location in the middle of Sunset City.

The different colors indicate different islands of navigation mesh – groups of triangles that are reachable from other islands via custom nav clues.
Bright sections are where sections of navmesh overlap in the X-Z plane.

There are multiple visualization modes for navmesh.

Usually, this is displayed over some in-game geometry – it exists to debug/understand the data in game and editor. Depending on what the world looks like, some colors are easier to read than others. (click for larger versions)




The second image shows the individual triangles – adjacent triangles do not reliably have different colors. And there is stable color selection as the camera moves, almost ðŸ˜�

Also, if you squint, you can make out the 16x16m tile boundaries, so you can get a sense of scale.

Here’s a map of the entirety of Sunset City:

“The Mystery of the Mooil Rig” DLC area:

“Dawn of the Rise of the Fallen Machine” DLC area:

Referencing the comments from up-thread, these maps represent the places where agents can be. Additionally, there is connectivity information – we have visualization for that as well.

This image has a few extra in-engine annotations, and some that I added:

The purple lines represent custom nav clues – one line in each direction that is connected.

Also marked are some railings with clues placed at regular intervals, a car with clues crisscrossing it, and moored boats with clues that allow enemies to chase the player.

Also in this image are very faint lines on the mesh that show connectivity between triangles. When a bot is failing to navigate, it can be useful to visualize the connectivity that the mesh thinks it has :)

The radio tower where the fight with Fizzie takes place:

The roller coaster:

The roller coaster tracks are one single, continuous and complete island of navmesh.

Navigation mesh doesn’t line up neatly with collision geometry, or render geometry. To make it easier to see, we draw it offset +0.5m up in world-space, so that it’s likely to be above the geometry it has been generated for. (A while ago, I wrote a full-screen post effect that drew onto rendered geometry based on proximity to navmesh. I thought it was pretty cool, and it was nicely unambiguous & imho easier to read – but I never finished it, it bitrot, and I never got back to it alas.)

Since shipping Sunset Overdrive, we added support for keeping smaller pieces of navmesh in memory – they’re now loaded in 128x128m parts, along with the rest of the open world.

@despair‘s recent technical postmortem has a little more on how this works:
‘Marvel’s Spider-Man’: A Technical Postmortem

Even so, we still load it all of an open world region to build the navmesh: the asset pipeline doesn’t provide information that is needed to generate navmesh for sub-regions efficiently & correctly, so it’s all-or-nothing. (I have ideas on how to improve this. One day…)

Let me know if you have any questions – preferably via twitter @twoscomplement

 

This post was originally a twitter thread:

,

Tim SerongHerringback

It occurs to me that I never wrote up the end result of the support ticket I opened with iiNet after discovering significant evening packet loss on our fixed wireless NBN connection in August 2017.

The whole saga took about a month. I was asked to run a battery of tests (ping, traceroute, file download and speedtest, from a laptop plugged directly into the NTD) three times a day for three days, then send all the results in so that a fault could be lodged. I did this, but somehow there was a delay in the results being communicated, so that by the time someone actually looked at them, they were considered stale, and I had to run the whole set of tests all over again. It’s a good thing I work from home, because otherwise there’s no way it would be possible to spend half an hour three times a day running tests like this. Having finally demonstrated significant evening slowdowns, a fault was lodged, and eventually NBN Co admitted that there was congestion in the evenings.

We have investigated and the cell which this user is connected to experiences high utilisation during busy periods. This means that the speed of this service is likely to be reduced, particularly in the evening when more people are using the internet.

nbn constantly monitors the fixed wireless network for sites which require capacity expansion and we aim to upgrade site capacity before congestion occurs, however sometimes demand exceeds expectations, resulting in a site becoming congested.

This site is scheduled for capacity expansion in Quarter 4, 2017 which should result in improved performance for users on the site. While we endeavour to upgrade sites on their scheduled date, it is possible for the date to change.

I wasn’t especially happy with that reply after a support experience that lasted for a month, but some time in October that year, the evening packet loss became less, and the window of time where we experienced congestion shrank. So I guess they did do some sort of capacity expansion.

It’s been mostly the same since then, i.e. slower in the evenings than during the day, but, well, it could be worse than it is. There was one glitch in November or December 2018 (poor speed / connection issues again, but this time during the day) which resulted in iiNet sending out a new router, but I don’t have a record of this, because it was a couple of hours of phone support that for some reason never appeared in the list of tickets in the iiNet toolbox, and even if it had, once a ticket is closed, it’s impossible to click it to view the details of what actually happened. It’s just a subject line, status and last modified date.

Fast forward to Monday March 25 2019 – a day with a severe weather warning for damaging winds – and I woke up to 34% packet loss, ping times all over the place (32-494ms), continual disconnections from IRC and a complete inability to use a VPN connection I need for work. I did the power-cycle-everything dance to no avail. I contemplated a phone call to support, then tethered my laptop to my phone instead in order to get a decent connection, and decided to wait it out, confident that the issue had already been reported by someone else after chatting to my neighbour.

hideous-packet-loss-march-2019

Tuesday morning it was still horribly broken, so I unplugged the router from the NTD, plugged a laptop straight in, and started running ping, traceroute and speed tests. Having done that I called support and went through the whole story (massive packet loss, unusable connection). They asked me to run speed tests again, almost all of which failed immediately with a latency error. The one that did complete showed about 8Mbps down, compared to the usual ~20Mbps during the day. So iiNet lodged a fault, and said there was an appointment available on Thursday for someone to come out. I said fine, thank you, and plugged the router back in to the NTD.

Curiously, very shortly after this, everything suddenly went back to normal. If I was a deeply suspicious person, I’d imagine that because I’d just given the MAC address of my router to support, this enabled someone to reset something that was broken at the other end, and fix my connection. But nobody ever told me that anything like this happened; instead I received a phone call the next day to say that the “speed issue” I had reported was just regular congestion and that the tower was scheduled for an upgrade later in the year. I thanked them for the call, then pointed out that the symptoms of this particular issue were completely different to regular congestion and that I was sure that something had actually been broken, but I was left with the impression that this particular feedback would be summarily ignored.

I’m still convinced something was broken, and got fixed. I’d be utterly unsurprised if there had been some problem with the tower on the Sunday night, given the strong winds, and it took ’til mid-Tuesday to get it sorted. But we’ll never know, because NBN Co don’t publish information about congestion, scheduled upgrades, faults and outages anywhere the general public can see it. I’m not even sure they make this information consistently available to retail ISPs. My neighbour, who’s with a different ISP, sent me a notice that says there’ll be maintenance/upgrades occurring on April 18, then again from April 23-25. There’s nothing about this on iiNet’s status page when I enter my address.

There was one time in the past few years though, when there was an outage that impacted me, and it was listed on iiNet’s status page. It said “customers in the area of Herringback may be affected”. I initially didn’t realise that meant me, as I’d never heard for a suburb, region, or area called Herringback. Turns out it’s the name of the mountain our NBN tower is on.

,

Robert CollinsContinuous Delivery and software distributors

Back in 2010 the continuous delivery meme was just grabbing traction. Today its extremely well established… except in F/LOSS projects.

I want that to change, so I’m going to try and really bring together a technical view on how that could work – which may require multiple blog posts – and if it gets traction I’ll put my fingers where my thoughts are and get into specifics with any project that wants to do this.

This is however merely a worked model today: it may be possible to do things quite differently, and I welcome all discussion about the topic!

tl;dr

Pick a service discovery mechanism (e.g. environment variables), write two small APIs – one for flag delivery, with streaming updates, and one for telemetry, with an optional aggressive data hiding proxy, then use those to feed enough data to drive a true CI/CD cycle back to upstream open source projects.

Who is in?

Background

(This assumes you know what C/D is – if you don’t, go read the link above, maybe wikipedia etc, then come back.)

Consider a typical SaaS C/D pipeline:

git -> build -> test -> deploy

Here all stages are owned by the one organisation. Once deployed, the build is usable by users – its basically the simplest pipeline around.

Now consider a typical on-premise C/D pipeline:

git -> build -> test -> expose -> install

Here the last stage, the install stage, takes place in the users context, but it may be under the control of the create, or it may be under the control of the user. For instance, Google play updates on an Android phone: when one selects ‘Update Now’, the install phase is triggered. Leaving the phone running with power and Wi-Fi will trigger it automatically, and security updates can be pushed anytime. Continuing the use of Google Play as an example, the expose step here is an API call to upload precompiled packages, so while there are three parties, the distributor – Google – isn’t performing any software development activities (they do gatekeep, but not develop).

Where it gets awkward is when there are multiple parties doing development in the pipeline.

Distributing and C/D

Lets consider an OpenStack cloud underlay circa 2015: an operating system, OpenStack itself, some configuration management tool (or tools), a log egress tool, a metrics egress handler, hardware mgmt vendor binaries. And lets say we’re working on something reasonably standalone. Say horizon.

OpenStack for most users is something obtained from a vendor. E.g. Cisco or Canonical or RedHat. And the model here is that the vendor is responsible for what the user receives; so security fixes – in particular embargoed security fixes – cannot be published publically and the slowly propogate. They must reach users very quickly. Often, ideally, before the public publication.

Now we have something like this:

upstream ends with distribution, then vendor does an on-prem pipeline


Can we not just say ‘the end of the C/D pipeline is a .tar.gz of horizon at the distribute step? Then every organisation can make their own decisions?

Maybe…

Why C/D?

  • Lower risk upgrades (smaller changes that can be reasoned about better; incremental enablement of new implementations to limit blast radius, decoupling shipping and enablement of new features)
  • Faster delivery of new features (less time dealing with failed upgrades == more time available to work on new features; finished features spend less time in inventory before benefiting users).
  • Better code hygiene (the same disciplines needed to make C/D safe also make more aggressive refactoring and tidiness changes safer to do, so it gets done more often).

1. If the upstream C/D pipeline stops at a tar.gz file, the lower-risk upgrade benefit is reduced or lost: the pipeline isn’t able to actually push all the to installation, and thus we cannot tell when a particular upgrade workaround is no longer needed.

But Robert, that is the vendors problem!

I wish it was: in OpenStack so many vendors had the same problem they created shared branches to work on it, then asked for shared time from the project to perform C/I on those branches. The benefit is only realise when the developer who is responsible for creating the issue can fix it, and can be sure that the fix has been delivered; this means either knowing that every install will install transiently every intermediary version, or that they will keep every workaround for every issue for some minimum time period; or that there will be a pipeline that can actually deliver the software.

2. .tar.gz files are not installed and running systems. A key characteristic of a C/D pipeline is that is exercises the installation and execution of software; the ability to run a component up is quite tightly coupled to the component itself, for all the the ‘this is a process’ interface is very general, the specific ‘this is server X’ or ‘this is CLI utility Y’ interfaces are very concrete. Perhaps a container based approach, where a much narrower interface in many ways can be defined, could be used to mitigate this aspect. Then even if different vendors use different config tools to do last mile config, the dev cycle knows that configuration and execution works. We need to make sure that we don’t separate the teams and their products though: the pipeline upstream must only test code that is relevant to upstream – and downstream likewise. We may be able to find a balance here, but I think more work articulating what that looks like it needed.

3. it will break the feedback cycle if the running metrics are not receive upstream; yes we need to be careful of privacy aspects, but basic telemetry: the upgrade worked, the upgrade failed, here is a crash dump – these are the tools for sifting through failure at scale, and a number of open source projects like firefox, Ubuntu and chromium have adopted them, with great success. Notably all three have direct delivery models: their preference is to own the relationship with the user and gather such telemetry directly.

C/D and technical debt

Sidebar: ignoring public APIs and external dependencies, because they form the contract that installations and end users interact with, which we can reasonably expect to be quite sticky, the rest of a system should be entirely up to the maintainers right? Refactor the DB; Switch frameworks, switch languages. Cleanup classes and so on. With microservices there is a grey area: APIs that other microservices use which are not publically supported.

The grey area is crucial, because it is where development drag comes in: anything internal to the system can be refactored in a single commit, or in a series of small commits that is rolled up into one, or variations on this theme.

But some aspect that another discrete component depends upon, with its own delivery cycle: that cannot be fixed, and unless it was built with the same care public APIs were, it may well have poor scaling or performance characteristics that making fixing it very important.

Given two C/D’d components A and B, where A wants to remove some private API B uses, A cannot delete that API from its git repo until all B’s everywhere that receive A via C/D have been deployed with a version that does not use the private API.

That is, old versions of B place technical debt on A across the interfaces of A that they use. And this actually applies to public interfaces too – even if they are more sticky, we can expect the components of an ecosystem to update to newer APIs that are cheaper to serve, and laggards hold performance back, keep stale code alive in the codebase for longer and so on.

This places a secondary requirement on the telemetry: we need to be able to tell whether the fleet is upgraded or not.

So what does a working model look like?

I think we need a different diagram than the pipeline; the pipeline talks about the things most folk doing an API or some such project will have directly in hand, but its not actually the full story. The full story is rounded out with two additional features. Feature flags and telemetry. And since we want to protect our users, and distributors probably will simply refuse to provide insights onto actual users, lets assume a near-zero-trust model around both.

Feature flags

As I discussed in my previous blog post, feature flags can be used for fairly arbitrary purposes, but in this situation, where trust is limited, I think we need to identify the crucial C/D enabling use cases, and design for them.

I think that those can be reduce to soft launches – decoupling activating new code paths from getting them shipped out onto machines, and kill switches – killing off flawed / faulty code paths when they start failing in advance of a massive cascade failure; which we can implement with essentially the same thing: some identifier for a code path and then a percentage of the deployed base to enable it on. If we define this API with efficient streaming updates and a consistent service discovery mechanism for the flag API, then this could be replicated by vendors and other distributors or even each user, and pull the feature API data downstream in near real time.

Telemetry

The difficulty with telemetry APIs is that they can egress anything. OTOH this is open source code, so malicious telemetry would be visible. But we can structure it to make it harder to violate privacy.

What does the C/D cycle need from telemetry, and what privacy do we need to preserve?

This very much needs discussion with stakeholders, but at a first approximation: the C/D cycle depends on knowing what versions are out there and whether they are working. It depends on known what feature flags have actually activated in the running versions. It doesn’t depend on absolute numbers of either feature flags or versions

Using Google Play again as an example, there is prior art – https://support.google.com/firebase/answer/6317485 – but I want to think truely minimally, because the goal I have is to enable C/D in situations with vastly different trust levels than Google play has. However, perhaps this isn’t enough, perhaps we do need generic events and the ability to get deeper telemetry to enable confidence.

That said, let us sketch what an API document for that might look like:

project:
version:
health:
flags:
- name:
  value:

If that was reported by every deployed instance of a project, once per hour, maybe with a dependencies version list added to deal with variation in builds, it would trivially reveal the cardinality of reporters. Many reporters won’t care (for instance QA testbeds). Many will.

If we aggregate through a cardinality hiding proxy, then that vector is addressed – something like this:

- project:
  version:
  weight:
  health:
  flags:
  - name:
    value:
- project: ...

Because this data is really only best effort, such a proxy could be backed by memcache or even just an in-memory store, depending on what degree of ‘cloud-nativeness’ we want to offer. It would receive accurate data, then deduplicate to get relative weights, round those to (say) 5% as a minimum to avoid disclosing too much about long tail situations (and yes, the sum of 100 1% reports would exceed 100 :)), and then push that up.

Open Questions

  • Should library projects report, or are they only used in the context of an application/service?
    • How can we help library projects answer questions like ‘has every user stopped using feature Y so that we can finally remove it’ ?
  • Would this be enough to get rid of the fixation on using stable branches everyone seems to have?
    • If not why not?
  • What have I forgotten?

,

Robert CollinsFeature flags

Feature toggles, feature flags – they’ve been written about a lot already (use a search engine :)), yet I feel like writing a post about them. Why? I’ve been personally involved in two from-scratch implementations, and it may be interesting for folk to read about that.

I say that lots has been written; http://featureflags.io/ (which appears to be a bit of an astroturf site for LaunchDarkly 😉 ) nevertheless has gathered a bunch of links to literature as well as a number of SDKs and the like; there are *other* FFaaS offerings than LaunchDarkly; I have no idea which I would use for my next project at this point – but hopefully you’ll have some tools to reason about that at the end of this piece.

I’m going to entirely skip over the motivation (go read those other pieces), other than to say that the evidence is in, trunk based development is better.



Humble, J. and Kim, G., 2018. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution.

A feature flag is a very simple thing: it is a value controlled outside of your development cycle that in turn controls the behaviour of your code. There are dozens of ways to implement that. hash-defines and compile time flags have been used for a very long time, so long that we don’t even think of them as feature flags, but they are are. So are configuration options in configuration files in the broadest possible sense. The difference is largely in focus, and where the same system meets all parties needs, I think its entirely fine to use just the one system – that is what we did for Launchpad, and it worked quite well I think – as far as I know it hasn’t been changed. Specifically in Launchpad the Zope runtime config is regular ZCML files on disk, and feature flags are complementary to that (but see the profiling example below).

Configuration tends to be thought of as “choosing behaviour after the system is compiled and before the process is started” – e.g. creating files on disk. But this is not always the case – some enterprise systems are notoriously flexible with database managed configuration rulesets which no-one can figure out – and we don’t want to create that situation.

Lets generalise things a little – a flag could be configured over the lifetime of the binary (compile flag), execution (runtime flag/config file or one-time evaluation of some dynamic system), time(dynamically reconfigured from changed config files or some dynamic system), or configured based on user/team/organisation, URL path (of a web request naturally :P), and generally any other thing that could be utilised in making a decision about whether to conditionally perform some code. It can also be useful to be able to randomly bucket some fraction of checks (e.g. 1/3 of all requests will go down this code path).. but do it consistently for the same browser.

Depending on what sort of system you are building, some of those sorts of scopes will be more or less important to you – for instance, if you are shipping on-premise software, you may well want to be turning unreleased features entirely off in the binary. If you are shipping a web API, doing soft launches with population rollouts and feature kill switches may be your priority.

Similarly, if you have an existing microservices architecture, having a feature flags aaS API is probably much more important (so that your different microservices can collaborate on in-progress features!) than if you have a monolithic DB where you put all your data today.

Ultimately you will end up with something that looks roughly like a key-value store: get_flag_value(flagname, context) -> value. Somewhere separate to your code base you will have a configuration store where you put rules that define how that key-value interface comes to a given value.

There are a few key properties that I consider beneficial in a feature flag systems:

  • Graceful degradation
  • Permissionless / (or alternatively namespaced)
  • Loosely typed
  • Observable
  • Centralised
  • Dynamic

Graceful Degradation

Feature flags will be consulted from all over the place – browser code, templates, DB mapper, data exporters, test harnesses etc. If the flag system itself is degraded, you need the systems behaviour to remain graceful, rather than stopping catastrophically. This often requires multiple different considerations; for instance, having sensible defaults for your flags (choose a default that is ok, change the meaning of defaults as what is ‘ok’ changes over time), having caching layers to deal with internet flakiness or API blips back to your flag store, making sure you have memory limits on local caches to prevent leaks and so forth. Different sorts of flag implementations have different failure modes : an API based flag system will be quite different to one stored in the same DB the rest of your code is using, which will be different to a process-startup command line option flag system.

A second dimension where things can go wrong is dealing with missing or unexpected flags. Remember that your system changes over time: a new flag in the code base won’t exist in the database until after the rollout, and when a flag is deleted from the codebase, it may still be in your database. Worse, if you have multiple instances running of a service, you may have different code all examining the same flag at the same time, so operations like ‘we are changing the meaning of a flag’ won’t take place atomically.

Permissions

Flags have dual audiences; one part is pure dev: make it possible to keep integration costs and risks low by merging fully integrated code on a continual basis without activating not-yet-ready (or released!) codepaths. The second part is pure operations: use flags to control access to dark launches, demo new features, killswitch parts of the site during attack mitigation, target debug features to staff and so forth.

Your developers need some way to add and remove the flags needed in their inner loop of development. Lifetimes of a few days for some flags.

Whoever is doing operations on prod though, may need some stronger guarantees – particularly they may need some controls over who can enable what flags. e.g. if you have a high control environment then team A shouldn’t be able to influence team B’s flags. One way is to namespace the flags and only permit configuration for the namespace a developer’s team(s) owns. Another way is to only have trusted individuals be able to set flags – but this obviously adds friction to processes.

Loose Typing

Some systems model the type of each flag: is it boolean, numeric, string etc. I think this is a poor idea mainly because it tends to interact poorly with the ephemeral nature of each deployment of a code base. If build X defines flag Y as boolean, and build X+1 defines it as string, the configuration store has to interact with both at the same time during rollouts, and do so gracefully. One way is to treat everything as a string and cast it to the desired type just in time, with failures being treated as default.

Observable

Make sure that when a user reports crazy weird behaviour, that you can figure out what value they had for what flags. For instance, in Launchpad we put them in the HTML.

Centralised

Having all your flags in one system lets you write generic tooling – such as ‘what flags are enabled in QA but not production’, or ‘what flags are set but have not been queried in the last month’. It is well worth the effort to build a single centralised system (or consume one such thing) and then use it everywhere. Writing adapters to different runtimes is relatively low overhead compared to rummaging through N different config systems because you can’t remember which one is running which platform.

Scope things in the system with a top level tenant / project style construct (any FFaaS will have this I’m sure :)).

Dynamic

There may be some parts of the system that cannot apply some flags rapidly, but generally speaking the less poking around that needs to be done to make something take effect the better. So build or buy a dynamic system, and if you want a ‘only on process restart’ model for some bits of it, just consult the dynamic system at the relevant time (e.g. during k8s Deployment object creation, or process startup, or …). But then everywhere else, you can react just-in-time; and even make the system itself be script driven.

The Launchpad feature flag system

I was the architect for Launchpad when the flag system was added. Martin Pool wanted to help accelerate feature development on Launchpad, and we’d all become aware of the feature flag style things hip groups like YouTube were doing; so he wrote a LEP: https://dev.launchpad.net/LEP/FeatureFlags , pushed that through our process and then turned it into code and docs (and once the first bits landed folk started using and contributing to it). Here’s a patch I wrote using the system to allow me to turn on Python profiling remotely. Here’s one added by William Grant to allow working around a crash in a packaging tool.

Launchpad has a monolithic data store, with bulk data federated out to various disk stores, but all relational data in one schema; we didn’t see much benefit in pushing for a dedicated API per se at that time – it can always be added later, as the design was deliberately minimal. The flags implementation is all in-process as a result, though there may be a JS thunk at this point – I haven’t gone looking. Permissions are done through trusted staff members, it is loosely typed and has an audit log for tracking changes.

The other one

The other one I was involved in was at VMware a couple years ago now; its in-house, but some interesting anecdotes I can share. The thinking on feature flags when I started the discussion was that they were strictly configuration file settings – I was still finding my feet with the in-house Xenon framework at the time (I think this was week 3 ? 🙂 so I whipped up an API specification and a colleague (Tyler Curtis) turned that into a draft engine; it wasn’t the most beautiful thing but it was still going strong and being enhanced by the team when I left earlier this year. The initial implementation had a REST API and a very basic set of scopes. That lasted about 18 months before tenant based scopes were needed and added. I had designed it with the intent of adding multi-arm bandit selection down the track, but we didn’t make the time to develop that capability, which is a bit sad.

Comparing that API with LaunchDarkly I see that they do support A/B trials but don’t have multivariate tests live yet, which suggests that they are still very limited in that space. I think there is room for some very simple home grown work in this area to pay off nicely for Symphony (the project codename the flags system was written for).

Should you run your own?

It would be very unusual to have PII or customer data in the flag configuration store; and you shouldn’t have access control lists in there either (in LP we did allow turning on code by group, which is somewhat similar). Point is, that the very worst thing that can happen if someone else controls your feature flags and is malicious is actually not very bad. So as far as aaS vendor trust goes, not a lot of trust is needed to be pretty comfortable using one.

But, if you’re in a particularly high-trust environment, or you have no internet access, running your own may be super important, and then yeah, do it :). They aren’t big complex systems, even with multi-arm bandit logic added in (the difficulty there is the logic, not the processing).

Or if you think the prices being charged by the incumbents are ridiculous. Actually, perhaps hit me up and we’ll make a startup and do this right…

Should you build your own?

A trivial flag system + persistence could be as little as a few days work. Less if you grab an existing bolt-on for your framework. If you have multiple services, or teams, or languages.. expect that to become the gift that keeps on giving as you have to consolidate and converge across your organisation indefinitely. If you have the resources – great, not a problem.


I think most people will be better off taking one of the existing open source flag systems – perhaps https://unleash.github.io/ – and using it; even if it is more complex than a system tightly fitted to your needs, the benefit of having one that is a true API from the start will pay for itself the very first time you split a project, or want to report what features are on in dev and off in prod, not to mention multiple existing language bindings etc.

,

Glen TurnerJupyter notebook and R

This has become substantially simpler in Fedora 29:

sudo dnf install notebook R-IRKernel R-IRdisplay


comment count unavailable comments

,

Glen TurnerRipping language-learning CDs

It might be tempting to use MP3's variable bit rate for encoding ripped foreign languages CDs. With the large periods of silence that would seem to make a lot of sense. But you lose the ability to rewind to an exact millisecond, which turns out to be essential as you want to hear a particular phrase a handful of times. So use CBR -- constant bit rate -- encoding, and at a high bit rate like 160kbps.



comment count unavailable comments

,

Glen Turnerwpa_supplicant update trades off interoperation for security

In case you want to choose a different security compromise, the update has a nice summary:

wpasupplicant (2:2.6-19) unstable; urgency=medium

  With this release, wpasupplicant no longer respects the system
  default minimum TLS version, defaulting to TLSv1.0, not TLSv1.2. If
  you're sure you will never connect to EAP networks requiring anything less
  than 1.2, add this to your wpasupplicant configuration:

    tls_disable_tlsv1_0=1
    tls_disable_tlsv1_1=1

  wpasupplicant also defaults to a security level 1, instead of the system
  default 2. Should you need to change that, change this setting in your
  wpasupplicant configuration:

    openssl_ciphers=DEFAULT@SECLEVEL=2

  Unlike wpasupplicant, hostapd still respects system defaults.

 -- Andrej Shadura <…@debian.org>  Sat, 15 Dec 2018 14:22:18 +0100


comment count unavailable comments

,

Glen TurnerFinding git credentials in libsecret

To find passwords in libsecret you need to know what attributes to search for. These are often set by some shim but not documented. The attributes tend to vary by shim.

For git's libsecret shim the attributes are: protocol, server, user.

A worked example, the account gdt on git.example.org:

$ secret-tool search --all 'protocol' 'https' 'server' 'git.example.org' 'user' 'gdt'
[/org/freedesktop/secrets/collection/login/123]
label = Git: https://git.example.org/
secret = CvKxlezMsSDuR7piMBTzREJ7l8WL1T
created = 2019-02-01 10:20:34
modified = 2019-02-01 10:20:34
schema = org.gnome.keyring.NetworkPassword
attribute.protocol = https
attribute.server = git.example.org
attribute.user = gdt

Note that the "label" is mere documentation, it's the "attribute" entries which matter.



comment count unavailable comments

,

Tim SerongDistributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus

On January 21, 2019 I presented Distributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus at the linux.conf.au 2019 Systems Administration Miniconf. Thanks to the incredible Next Day Video crew, the video was online the next day, and you can watch it here:

If you’d rather read than watch, the meat of the talk follows, but before we get to that I have two important announcements:

  1. Cephalocon 2019 is coming up on May 19-20, in Barcelona, Spain. The CFP is open until Friday February 1, so time is rapidly running out for submissions. Get onto it.
  2. If you’re able to make it to FOSDEM on February 2-3, there’s a whole Software Defined Storage Developer Room thing going on, with loads of excellent content including What’s new in Ceph Nautilus – project status update and preview of the coming release and Managing and Monitoring Ceph with the Ceph Manager Dashboard, which will cover rather more than I was able to here.

Back to the talk. At linux.conf.au 2018, Sage Weil presented “Making distributed storage easy: usability in Ceph Luminous and beyond”. What follows is somewhat of a sequel to that talk, covering the changes we’ve made in the meantime, and what’s still coming down the track. If you’re not familiar with Ceph, you should probably check out A Gentle Introduction to Ceph before proceeding. In brief though, Ceph provides object, block and file storage in a single, horizontally scalable cluster, with no single points of failure. It’s Free and Open Source software, it runs on commodity hardware, and it tries to be self-managing wherever possible, so it notices when disks fail, and replicates data elsewhere. It does background scrubbing, and it tries to balance data evenly across the cluster. But you do still need to actually administer it.

This leads to one of the first points Sage made this time last year: Ceph is Hard. Status display and logs were traditionally difficult to parse visually, there were (and still are) lots of configuration options, tricky authentication setup, and it was difficult to figure out the number of placement groups to use (which is really an internal detail of how Ceph shards data across the cluster, and ideally nobody should need to worry about it). Also, you had to do everything with a CLI, unless you had a third-party GUI.

I’d like to be able to flip this point to the past tense, because a bunch of those things were already fixed in the Luminous release in August 2017; status display and logs were cleaned up, a balancer module was added to help ensure data is spread more evenly, crush device classes were added to differentiate between HDDs and SSDs, a new in-tree web dashboard was added (although it was read-only, so just cluster status display, no admin tasks), plus a bunch of other stuff.

But we can’t go all the way to saying “Ceph was hard”, because that might imply that everything is now easy. So until we reach that frabjous day, I’m just going to say that Ceph is easier now, and it will continue to get easier in future.

At linux.conf.au in January 2018, we were half way through the Mimic development cycle, and at the time the major usability enhancements planned included:

  • Centralised configuration management
  • Slick deployment in Kubernetes with Rook
  • A vastly improved dashboard based on ceph-mgr and openATTIC
  • Placement Group merging

We got some of that stuff done for Mimic, which was released in June 2018, and more of it is coming in the Nautilus release, which is due out very soon.

In terms of usability improvements, Mimic gave us a new dashboard, inspired by and derived from openATTIC. This dashboard includes all the features of the Luminous dashboard, plus username/password authentication, SSL/TLS support, RBD and RGW management, and a configuration settings browser. Mimic also brought the ability to store and manage configuration options centrally on the MONs, which means we no longer need to set options in /etc/ceph/ceph.conf, replicate that across the cluster, and restart whatever daemons were affected. Instead, you can run `ceph config set ...` to make configuration changes. For initial cluster bootstrap, you can even use DNS SRV records rather than specifying MON hosts in the ceph.conf file.

As I mentioned, the Nautilus release is due out really soon, and will include a bunch more good stuff:

  • PG autoscaling:
  • More dashboard enhancements, including:
    • Multiple users/roles, also single sign on via SAML
    • Internationalisation and localisation
    • iSCSI and NFS Ganesha management
    • Embedded Grafana dashboards
    • The ability to mark OSDs up/down/in/out, and trigger scrubs/deep scrubs
    • Storage pool management
    • A configuration settings editor which actually tells you what the configuration settings mean, and do
    • Embedded Grafana dashboards
    • To see what this all looks like, check out Ceph Manager Dashboard Screenshots as of 2019-01-17
  • Blinky lights, that being the ability to turn on or off the ident and fault LEDs for the disk(s) backing a given OSD, so you can find the damn things in your DC.
  • Orchestrator module(s)

Blinky lights, and some of the dashboard functionality (notably configuring iSCSI gateways and NFS Ganesha) means that Ceph needs to be able to talk to whatever tool it was that deployed the cluster, which leads to the final big thing I want to talk about for the Nautilus release, which is the Orchestrator modules.

There’s a bunch of ways to deploy Ceph, and your deployment tool will always know more about your environment, and have more power to do things than Ceph itself will, but if you’re managing Ceph, through the inbuilt dashboard and CLI tools, there’s things you want to be able to do as a Ceph admin, that Ceph itself can’t do. Ceph can’t deploy a new MDS, or RGW, or NFS Ganesha host. Ceph can’t deploy new OSDs by itself. Ceph can’t blink the lights on a disk on some host if Ceph itself has somehow failed, but the host is still up. For these things, you rely on your deployment tool, whatever it is. So Nautilus will include Orchestrator modules for Ansible, DeepSea/Salt, and Rook/Kubernetes, which allow the Ceph management tools to call out to your deployment tool as necessary to have it perform those tasks. This is the bit I’m working on at the moment.

Beyond Nautilus, Octopus is the next release, due in a bit more than nine months, and on the usability front I know we can expect more dashboard and more orchestrator functionality, but before that, we have the Software Defined Storage Developer Room at FOSDEM on February 2-3 and Cephalocon 2019 on May 19-20. Hopefully some of you reading this will be able to attend 🙂

Update 2019-02-04: Check out Sage’s What’s new in Ceph Nautilus FOSDEM talk for much more detail on what’s coming up in Nautilus and beyond.

,

Julien GoodwinTransport security for BGP, AKA BGP-STARTTLS, a proposal

Several days ago, inspired in part by an old work mail thread being resurrected I sent this image as a tweet captioned "The state of BGP transport security.":



The context of the image for those not familiar with it is this image about noSQL databases.

This triggered a bunch of discussion, with multiple people saying "so how would *you* do it", and we'll get to that (or for the TL;DR skip to the bottom), but first some background.

The tweet is a reference to the BGP protocol the Internet uses to exchange routing data between (and within) networks. This protocol (unless inside a separate container) is never encrypted, and can only be authenticated (in practice) by a TCP option known as TCP-MD5 (standardised in RFC2385). The BGP protocol itself has no native encryption or authentication support. Since routing data can often be inferred by the packets going across a link anyway, this has lead to this not being a priority to fix.

Transport authentication & encryption is a distinct issue from validation of the routing data transported by BGP, an area already being worked on by the various RPKI projects, eventually transport authentication may be able to benefit from some of the work done by those groups.

TCP-MD5 is quite limited, and while generally supported by all major BGP implementations it has one major limitation that makes it particularly annoying, in that it takes a single key, making key migration difficult (and in many otherwise sensible topologies, impossible without impact). Being a TCP option is also a pain, increasing fragility.

At the time of its introduction TCP-MD5 gave two main benefits the first was to have some basic authentication beyond the basic protocol (for which the closest element in the protocol is the validation of peer-as in the OPEN message, and a mismatch will helpfully tell you who the far side was looking for), plus making it harder to interfere with the TCP session, which on many of the TCP implementations of the day was easier than it should have been. Time, however has marched on, and protection against session interference from non-MITM is no longer needed, the major silent MITM case of Internet Exchanges using hubs is long obsolete, plus, in part due to the pain associated in changing keys many networks have a "default" key they will use when turning up a peering session, these keys are often so well known for major networks that they've often been shared on public mailing lists, eliminating what little security benefit TCP-MD5 still brings.

This has been known to be a problem for many years, and the proposed replacement TCP-AO (The TCP Authentication Option) was standardised in 2010 as RFC5925, however, to the best of my knowledge eight years later no mainstream BGP implementation supports it, and as it too is a TCP option, not only does it still has many of the downsides of MD5, but major OS kernels are much less likely to implement new ones (indeed, an OS TCP maintainer commenting such on the thread I mentioned above is what kicked off my thinking).

TCP, the wire format, is in practice unchangeable. This is one of the major reasons for QUIC, the TCP replacement protocol soon to be standardised as HTTP/3, so for universal deployment any proposal that requires changes to TCP is a non-starter.

Any solution must be implementable & deployable.
  • Implementable - BGP implementations must be able to implement it, and do so correctly, ideally with a minimum of effort.
  • Deployable - Networks need to be able to deploy it, when authentication issues occur error messages should be no worse than with standard BGP (this is an area most TCP-MD5 implementations fail at, of those I've used JunOS is a notable exception, Linux required kernel changes for it to even be *possible* to debug)


Ideally any security-critical code should already exist in a standardised form, with multiple widely-used implementations.

Fortunately for us, that exists in the form of TLS. IPSEC, while it exists, fails the deployable tests, as almost anyone who's ever had the misfortune of trying to get IPSEC working between different organisations using different implementations can attest, sure it can usually be made to work, but nowhere near as easily as TLS.

Discussions about the use of TLS for this purpose have happened before, but always quickly run into the problem of how certificates for this should be managed, and that is still an open problem, potentially the work on RPKI may eventually provide a solution here, but until that time we have a workable alternative in the form of TLS-PSK (first standardised in RFC4279), a set of TLS modes that allow the use of pre-shared keys instead of certificates (for those wondering, not only does this still exist in TLS1.3 it's in a better form). For a variety of reasons, not the least the lack of real-time clocks in many routers that may not be able to reach an NTP server until BGP is established, PSK modes are still more deployable than certificate verification today. One key benefit for TLS-PSK is it supports multiple keys to allow migration to a new key in a significantly less impactful manner.

The most obvious way to support BGP-in-TLS would simply be to listen on a new port (as is done for HTTPS for example), however there's a few reasons why I don't think such a method is deployable for BGP, primarily due to the need to update control-plane ACLs, a task that in large networks is often distant from the folk working with BGP, and in small networks may not be understood by any current staff (a situation not dissimilar to the state of TCP). Another option would simply be to use protocol multiplexing and do a TLS negotiation if a TLS hello is received, or unencrypted BGP for a BGP OPEN, this would violate the general "principal of least astonishment", and would be harder for operators to debug.

Instead I propose a design similar to that used by SMTP (where it is known as STARTTLS), during early protocol negotiation support for TLS is signalled using a zero-length capability in the BGP OPEN, the endpoints do a TLS negotiation, and then the base protocol continues inside the new TLS tunnel. Since this negotiation happens during the BGP OPEN, it does mean that other data included in the OPEN leaks. Primarily this is the ASN, but also the various other capabilities supported by the implementation (which could identify the implementation), I suggest that if TLS is required information in the initial OPEN not be validated, and standard reserved ASN be sent instead, and any other capabilities not strictly required not sent, with a fresh OPEN containing all normal information sent inside the TLS session.

Migration from TCP-MD5 is key point, however not one I can find any good answer for. Some implementations already allow TCP-MD5 to be optional, and that would allow an easy upgrade, however such support is rare, and unlikely to be more widely supported.

On that topic, allowing TLS to be optional in a consistent manner is particularly helpful, and something that I believe SHOULD be supported to allow cases like currently unauthenticated public peering sessions to be migrated to TLS with minimal impact. Allowing this does open the possibility of a downgrade attack, and make more plausible attacks causing state machine confusions (implementation believes it's talking on a TLS-secured session when it isn't).

What do we lose from TCP-MD5? Some performance, whilst this is not likely to be an issue for most situations, it is likely not an option for anyone still trying to run IPv4 full tables on a Cisco Catalyst 6500 with Sup720. We do also lose the TCP manipulation prevention aspects, however these days those are (IMO) of minimal value in practice. There's also the costs of implementations needing to include a TLS implementations, and whilst nearly every system will have one (at the very least for SSH) it may not already be linked to the routing protocol implementation.

Lastly, my apologies to anyone who has proposed this before, but my neither I nor my early reviewers were aware of such a proposal. Should such a proposal already exist, meeting the goals of implementable & deployable it may be sensible to pick that up instead.

The IETF is often said to work on "rough consensus and running code", for this proposal here's what I believe a minimal *actual* demonstration of consensus with code would be:
  • Two BGP implementations, not derived from the same source.
  • Using two TLS implementations, not derived from the same source.
  • Running on two kernels (at the very least, Linux & FreeBSD)


The TL;DR version:
  • Using a zero-length BGP capability in the BGP OPEN message implementations advertise their support for TLS
    • TLS version MUST be at least 1.3
    • If TLS is required, the AS field in the OPEN MAY be set to a well-known value to prevent information leakage, and other capabilities MAY be removed, however implementations MUST NOT require the TLS capability be the first, last or only capability in the OPEN
    • If TLS is optional, which MUST NOT be default behaviour), the OPEN MUST be (other than the capability) be the same as a session configured for no encryption
  • After the TCP client receives a conformation of TLS support from the TCP server's OPEN message, a TLS handshake begins
    • To make this deployable TLS-PSK MUST be supported, although exact configuration is TBD.
    • Authentication-only variants of TLS (ex RFC4785) REALLY SHOULD NOT be supported.
    • Standard certificate-based verification MAY be supported, and if supported MUST validate use client certificates, validating both. However, how roots of trust would work for this has not been investigated.
  • Once the TCP handshake completes the BGP state starts over with the client sending a new OPEN
    • Signalling the TLS capability in this OPEN is invalid and MUST be rejected
  • (From here, everything is unchanged from normal BGP)


Magic numbers for development:
  • Capability: (to be referred to as EXPERIMENTAL-STARTTLS) 219
  • ASN (for avoiding data leaks in OPEN messages): 123456
    • Yes this means also sending 4-byte capability. Every implementation that might possibly implement this already supports 4-byte ASNs.


The key words "MUST (BUT WE KNOW YOU WON'T)", "SHOULD CONSIDER", "REALLY SHOULD NOT", "OUGHT TO", "WOULD PROBABLY", "MAY WISH TO", "COULD", "POSSIBLE", and "MIGHT" in this document are to be interpreted as described in RFC 6919.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

,

Matt Palmerpwnedkeys: who has the keys to *your* kingdom?

pwnedkeys.com logo

I am extremely pleased to announce the public release of pwnedkeys.com – a database of compromised asymmetric encryption keys. I hope this will become the go-to resource for anyone interested in avoiding the re-use of known-insecure keys. If you have a need, or a desire, to check whether a key you’re using, or being asked to accept, is potentially in the hands of an adversary, I would encourage you to take a look.

Pwnage... EVERYWHERE

By now, most people in the IT industry are aware of the potential weaknesses of passwords, especially short or re-used passwords. Using a password which is too short (or, more technically, with “insufficient entropy”) leaves us open to brute force attacks, while re-using the same password on multiple sites invites a credential stuffing attack.

It is rare, however, that anyone thinks about the “quality” of RSA or ECC keys that we use with the same degree of caution. There are so many possible keys, all of which are “high quality” (and thus not subject to “brute force”), that we don’t imagine that anyone could ever compromise a private key except by actually taking a copy of it off our hard drives.

There is a unique risk with the use of asymmetric cryptography, though. Every time you want someone to encrypt something to you, or verify a signature you’ve created, you need to tell them your public key. While someone can’t calculate your private key from your public key, the public key does have enough information in it to be able to identify your private key, if someone ever comes across it.

So what?

smashed window

The risk here is that, in many cases, a public key truly is public. Every time your browser connects to a HTTPS-protected website, the web server sends a copy of the site’s public key (embedded in the SSL certificate). Similarly, when you connect to an SSH server, you get the server’s public key as part of the connection process. Some services provide a way for anyone to query a user’s public keys.

Once someone has your public key, it can act like an “index” into a database of private keys that they might already have. This is only a problem, of course, if someone happens to have your private key in their stash. The bad news is that there are a lot of private keys already out there, that have either been compromised by various means (accident or malice), or perhaps generated by a weak RNG.

When you’re generating keys, you usually don’t have to worry. The chances of accidentally generating a key that someone else already has is as close to zero as makes no difference. Where you need to be worried is when you’re accepting public keys from other people. Unlike a “weak” password, you can’t tell a known-compromised key just by looking at it. Even if you saw the private key, it would look just as secure as any other key. You cannot know whether a public key you’re being asked to accept is associated with a known-compromised private key. Or you couldn’t, until pwnedkeys.com came along.

The solution!

The purpose of pwnedkeys.com is to try and collect every private key that’s ever gotten “out there” into the public, and warn people off using them ever again. Don’t think that people don’t re-use these compromised keys, either. One of the “Debian weak keys” was used in an SSL certificate that was issued in 2016, some eight years after the vulnerability was made public!

My hope is that pwnedkeys.com will come to be seen as a worthwhile resource for anyone who accepts public keys, and wants to know that they’re not signing themselves up for a security breach in the future.

,

Matt PalmerFalsehoods Programmers Believe About Pagination

The world needs it, so I may as well write it.

  • The number of items on a page is fixed for all time.
  • The number of items on a page is fixed for one user.
  • The number of items on a page is fixed for one result set.
  • The pages are only browsed in one direction.
  • No item will be added to the result set during retrieval.
  • No item will be removed from the result set during retrieval.
  • Item sort order is stable.
  • Only one page of results will be retrieved at one time.
  • Pages will be retrieved in order.
  • Pages will be retrieved in a timely manner.
  • No problem will result from two different users seeing different pagination of the same items at about the same time. (From @ronburk)

,

OpenSTEMSchool-wide Understanding Our World® implementations

Are you considering implementing our integrated HASS+Science program, but getting a tad confused by the pricing?  Our subscription model didn’t not provide a So nowstraightforward calculation for a whole school or year-level.  However, it generally works out to $4.40 (inc.GST) per student.  So now we’re providing this as an option directly: implement our integrated HASS+Science program […]

The post School-wide Understanding Our World® implementations first appeared on OpenSTEM Pty Ltd.

,

BlueHackersEntrepreneurs’ Mental Health and Well-being Survey

Jamie Pride has partnered with Swinburne University and Dr Bronwyn Eager to conduct the largest mental health and well-being survey of Australian entrepreneurs and founders. This survey will take approx 5 minutes to complete. Can you also please spread the word and share this via your networks!

Getting current and relevant Australian data is extremely important! The findings of this study will contribute to the literature on mental health and well-being in entrepreneurs, and that this will potentially lead to future improvements in the prevention and treatment of psychological distress.

Jamie is extremely passionate about this cause! Your help is greatly appreciated.

The post Entrepreneurs’ Mental Health and Well-being Survey first appeared on BlueHackers.org.

,

Peter LieverdinkDark Doodad

It's been a while since I did a blog, so after twiddling the way the front page of the site displays, it's time to post a new one. The attached photo is of my favourite dark nebula, "The Dark Doodad". What looks like a long thin nebula is apparently a sheet over 40 light years wide that we happen to be seeing edge-on. On the left you can see a few dark tendrils that are par of the coal sack nebula. The Dark Doodad This is one of the first images created from a stack of subs I took using AstroDSLR. Each exposure is 2 minutes and I stacked 20 of them. My polar alignment was pretty decent, I think!

,

OpenSTEMChildren in Singapore will no longer be ranked by exam results. Here’s why | World Economic Forum

https://www.weforum.org/agenda/2018/10/singapore-has-abolished-school-exam-rankings-here-s-why The island nation is changing its educational focus to encourage school children to develop the life skills they will need when they enter the world of work.

The post Children in Singapore will no longer be ranked by exam results. Here’s why | World Economic Forum first appeared on OpenSTEM Pty Ltd.

,

OpenSTEMHelping Migrants to Australia

The end of the school year is fast approaching with the third term either over or about to end and the start of the fourth term looming ahead. There never seems to be enough time in the last term with making sure students have met all their learning outcomes for the year and with final […]

The post Helping Migrants to Australia first appeared on OpenSTEM Pty Ltd.

,

OpenSTEMOur interwoven ancestry

In 2008 a new group of human ancestors – the Denisovans, were defined on the basis of a single finger knuckle (phalanx) bone discovered in Denisova cave in the Altai mountains of Siberia. A molar tooth, found at Denisova cave earlier (in 2000) was determined to be of the same group. Since then extensive work […]

The post Our interwoven ancestry first appeared on OpenSTEM Pty Ltd.

,

Clinton RoyMoving to Melbourne

Now that the paperwork has finally all been dealt with, I can announce that I’ll be moving down to Melbourne to take up a position with the Australian Synchrotron, basically a super duper x-ray machine used for research of all types. My official position is a >in< Senior Scientific Software Engineer <out> I’ll be moving down to Melbourne shortly, staying with friends (you remember that offer you made, months ago?) until I find a rental near Monash Uni, Clayton.

I will be leaving behind Humbug, the computer group that basically opened up my entire career, and The Edge, SLQ, my home-away-from-home study. I do hope to be able to find replacements for these down south.

I’m looking at having a small farewell nearby soon.

A shout out to Netbox Blue for supplying all my packing boxes. Allll of them.

OpenSTEMThis Week in Australian History

The end of August and beginning of September is traditionally linked to the beginning of Spring in Australia, although the change in seasons is experienced in different ways in different parts of the country and was marked in locally appropriate ways by Aboriginal people. As a uniquely Australian celebration of Spring, National Wattle Day, celebrated […]

The post This Week in Australian History first appeared on OpenSTEM Pty Ltd.

,

BlueHackersVale Janet Hawtin Reid

Janet Hawtin ReidJanet Hawtin Reid (@lucychili) sadly passed away last week.

A mutual friend called me a earlier in the week to tell me, for which I’m very grateful.  We both appreciate that BlueHackers doesn’t ever want to be a news channel, so I waited writing about it here until other friends, just like me, would have also had a chance to hear via more direct and personal channels. I think that’s the way these things should flow.

knitted Moomin troll by Janet Hawtin ReidI knew Janet as a thoughtful person, with strong opinions particularly on openness and inclusion.  And as an artist and generally creative individual,  a lover of nature.  In recent years I’ve also seen her produce the most awesome knitted Moomins.

Short diversion as I have an extra connection with the Moomin stories by Tove Jansson: they have a character called My, after whom Monty Widenius’ eldest daughter is named, which in turn is how MySQL got named.  I used to work for MySQL AB, and I’ve known that My since she was a little smurf (she’s an adult now).

I’m not sure exactly when I met Janet, but it must have been around 2004 when I first visited Adelaide for Linux.conf.au.  It was then also that Open Source Industry Australia (OSIA) was founded, for which Janet designed the logo.  She may well have been present at the founding meeting in Adelaide’s CBD, too.  OSIA logo - by Janet Hawtin ReidAnyhow, Janet offered to do the logo in a conversation with David Lloyd, and things progressed from there. On the OSIA logo design, Janet wrote:

I’ve used a star as the current one does [an earlier doodle incorporated the Southern Cross]. The 7 points for 7 states [counting NT as a state]. The feet are half facing in for collaboration and half facing out for being expansive and progressive.

You may not have realised this as the feet are quite stylised, but you’ll definitely have noticed the pattern-of-7, and the logo as a whole works really well. It’s a good looking and distinctive logo that has lasted almost a decade and a half now.

Linux Australia logo - by Janet Hawtin ReidAs Linux Australia’s president Kathy Reid wrote, Janet also helped design the ‘penguin feet’ logo that you see on Linux.org.au.  Just reading the above (which I just retrieved from a 2004 email thread) there does seem to be a bit of a feet-pattern there… of course the explicit penguin feet belong with the Linux penguin.

So, Linux Australia and OSIA actually share aspects of their identity (feet with a purpose), through their respective logo designs by Janet!  Mind you, I only realised all this when looking through old stuff while writing this post, as the logos were done at different times and only a handful of people have ever read the rationale behind the OSIA logo until now.  I think it’s cool, and a fabulous visual legacy.

Fir tree in clay, by Janet Hawtin Reid
Fir tree in clay, by Janet Hawtin Reid. Done in “EcoClay”, brought back to Adelaide from OSDC 2010 (Melbourne) by Kim Hawtin, Janet’s partner.

Which brings me to a related issue that’s close to my heart, and I’ve written and spoken about this before.  We’re losing too many people in our community – where, in case you were wondering, too many is defined as >0.  Just like in a conversation on the road toll, any number greater than zero has to be regarded as unacceptable. Zero must be the target, as every individual life is important.

There are many possible analogies with trees as depicted in the above artwork, including the fact that we’re all best enabled to grow further.

Please connect with the people around you.  Remember that connecting does not necessarily mean talking per-se, as sometimes people just need to not talk, too.  Connecting, just like the phrase “I see you” from Avatar, is about being thoughtful and aware of other people.  It can just be a simple hello passing by (I say hi to “strangers” on my walks), a short email or phone call, a hug, or even just quietly being present in the same room.

We all know that you can just be in the same room as someone, without explicitly interacting, and yet feel either connected or disconnected.  That’s what I’m talking about.  Aim to be connected, in that real, non-electronic, meaning of the word.

If you or someone you know needs help or talk right now, please call 1300 659 467 (in Australia – they can call you back, and you can also use the service online).  There are many more resources and links on the BlueHackers.org website.  Take care.

The post Vale Janet Hawtin Reid first appeared on BlueHackers.org.

,

Julien GoodwinCustom output pods for the Standard Research CG635 Clock Generator

As part of my previously mentioned side project the ability to replace crystal oscillators in a circuit with a higher quality frequency reference is really handy, to let me eliminate a bunch of uncertainty from some test setups.

A simple function generator is the classic way to handle this, although if you need square wave output it quickly gets hard to find options, with arbitrary waveform generators (essentially just DACs) the common option. If you can get away with just sine wave output an RF synthesizer is the other main option.

While researching these I discovered the CG635 Clock Generator from Stanford Research, and some time later picked one of these up used.

As well as being a nice square wave generator at arbitrary voltages these also have another set of outputs on the rear of the unit on an 8p8c (RJ45) connector, in both RS422 (for lower frequencies) and LVDS (full range) formats, as well as some power rails to allow a variety of less common output formats.

All I needed was 1.8v LVCMOS output, and could get that from the front panel output, but I'd then need a coax tail on my boards, as well as potentially running into voltage rail issues so I wanted to use the pod output instead. Unfortunately none of the pods available from Stanford Research do LVCMOS output, so I'd have to make my own, which I did.

The key chip in my custom pod is the TI SN65LVDS4, a 1.8v capable single channel LVDS reciever that operates at the frequencies I need. The only downside is this chip is only available in a single form factor, a 1.5mm x 2mm 10 pin UQFN, which is far too small to hand solder with an iron. The rest of the circuit is just some LED indicators to signal status.


Here's a rendering of the board from KiCad.

Normally "not hand solderable" for me has meant getting the board assembled, however my normal assembly house doesn't offer custom PCB finishes, and I wanted these to have white solder mask with black silkscreen as a nice UX when I go to use them, so instead I decided to try my hand at skillet reflow as it's a nice option given the space I've got in my tiny apartment (the classic tutorial on this from SparkFun is a good read if you're interested). Instead of just a simple plate used for cooking you can now buy hot plates with what are essentially just soldering iron temperature controllers, sold as pre-heaters making it easier to come close to a normal soldering profile.

Sadly, actually acquiring the hot plate turned into a bit of a mess, the first one I ordered in May never turned up, and it wasn't until mid-July that one arrived from a different supplier.

Because of the aforementioned lack of space instead of using stencils I simply hand-applied (leaded) paste, without even an assist tool (which I probably will acquire for next time), then hand-mounted the components, and dropped them on the plate to reflow. I had one resistor turn 90 degrees, and a few bridges from excessive paste, but for a first attempt I was really happy.


Here's a photo of the first two just after being taken off the hot plate.

Once the reflow was complete it was time to start testing, and this was where I ran into my biggest problems.

The two big problems were with the power supply I was using, and with my oscilloscope.

The power supply (A Keithley 228 Voltage/Current source) is from the 80's (Keithley's "BROWN" era), and while it has nice specs, doesn't have the most obvious UI. Somehow I'd set it to limit at 0ma output current, and if you're not looking at the segment lights it's easy to miss. At work I have an EEZ H24005 which also resets the current limit to zero on clear, however it's much more obvious when limiting, and a power supply with that level of UX is now on my "to buy" list.

The issues with my scope were much simpler. Currently I only have an old Rigol DS1052E scope, and while it works fine it is a bit of a pain to use, but ultimately I made a very simple mistake while testing. I was feeding in a trigger signal direct from the CG635's front outputs, and couldn't figure out why the generator was putting out such a high voltage (implausibly so). To cut the story short, I'd simply forgotten that the scope was set for use with 10x probes, and once I realised that everything made rather more sense. An oscilloscope with auto-detection for 10x probes, as well as a bunch of other features I want in a scope (much bigger screen for one), has now been ordered, but won't arrive for a while yet.

Ultimately the boards work fine, but until the new scope arrives I can't determine signal quality of them, but at least they're ready for when I'll need them, which is great for flow.

,

Andrew Ruthvenlinux.conf.au 2019 - Call for Proposals

At the start of July, the LCA2019 team announced that the Call for Proposals for linux.conf.au 2019 were open! This Call for Proposals will close on July 30. If you want to submit a proposal, you don't have much time!

linux.conf.au is one of the best-known community driven Free and Open Source Software conferences in the world. In 2019 we welcome you to join us in Christchurch, New Zealand on Monday 21 January through to Friday 25 January.

For full details including those not covered by this announcement visit https://linux.conf.au/call-for-papers/, and the full announcement is here.

IMPORTANT DATES

  • Call for Proposals Opens: 2 July 2018
  • Call for Proposals Closes: 30 July 2018 (no extensions)
  • Notifications from the programme committee: early-September 2018
  • Conference Opens: 21st January 2019

,

Julien GoodwinCustom uBlox GPSDO board

For the next part of my ongoing project I needed to test the GPS reciever I'm using, a uBlox LEA-M8F (M8 series chip, LEA form factor, and with frequency outputs). Since the native 30.72MHz oscillator is useless for me I'm using an external TCVCXO (temperature compensated, voltage controlled oscillator) for now, with the DAC & reference needed to discipline the oscillator based on GPS. If uBlox would sell me the frequency version of the chip on its own that would be ideal, but they don't sell to small customers.

Here's a (rather modified) board sitting on top of an Efratom FRK rubidium standard that I'm going to mount to make a (temporary) home standard (that deserves a post of its own). To give a sense of scale the silver connector at the top of the board is a micro-USB socket.



Although a very simple board I had a mess of problems once again, both in construction and in component selection.

Unlike the PoE board from the previous post I didn't have this board manufactured. This was for two main reasons, first, the uBlox module isn't available from Digikey, so I'd still need to mount it by hand. The second, to fit all the components this board has a much greater area, and since the assembly house I use charges by board area (regardless of the number or density of components) this would have cost several hundred dollars. In the end, this might actually have been the sensible way to go.

By chance I'd picked up a new soldering iron at the same time these boards arrived, a Hakko FX-951 knock-off and gave it a try. Whilst probably an improvement over my old Hakko FX-888 it's not a great iron, especially with the knife tip it came with, and certainly nowhere near as nice to use as the JBC CD-B (I think that's the model) we have in the office lab. It is good enough that I'm probably going to buy a genuine Hakko FM-203 with an FM-2032 precision tool for the second port.

The big problem I had hand-soldering the boards was bridges on several of the components. Not just the tiny (0.65mm pitch, actually the *second largest* of eight packages for that chip) SC70 footprint of the PPS buffer, but also the much more generous 1.1mm pitch of the uBlox module. Luckily solder wick fixed most cases, plus one where I pulled the buffer and soldered a new one more carefully.

With components, once again I made several errors:
  • I ended up buying the wrong USB connectors for the footprint I chose (the same thing happened with the first run of USB-C modules I did in 2016), and while I could bodge them into use easily enough there wasn't enough mechanical retention so I ended up ripping one connector off the board. I ordered some correct ones, but because I wasn't able to wick all solder off the pads they don't attach as strongly as they should, and whilst less fragile, are hardly what I'd call solid.
  • The surface mount GPS antenna (Taoglas AP.10H.01 visible in this tweet) I used was 11dB higher gain than the antenna I'd tested with the devkit, I never managed to get it to lock while connected to the board, although once on a cable it did work ok. To allow easier testing, in the end I removed the antenna and bodged on an SMA connector for easy testing.
  • When selecting the buffer I accidentally chose one with an open-drain output, I'd meant to use one with a push-pull output. This took quite a silly long time for me to realise what mistake I'd made. Compounding this, the buffer is on the 1PPS line, which only strobes while locked to GPS, however my apartment is a concrete box, with what GPS signal I can get inside only available in my bedroom, and my oscilloscope is in my lab, so I couldn't demonstrate the issue live, and had to inject test signals. Luckily a push-pull is available in the same footprint, and a quick hot-air aided swap later (once parts arrived from Digikey) it was fixed.

Lessons learnt:
  • Yes I can solder down to ~0.5mm pitch, but not reliably.
  • More test points on dev boards, particularly all voltage rails, and notable signals not otherwise exposed.
  • Flux is magic, you probably aren't using enough.

Although I've confirmed all basic functions of the board work, including GPS locking, PPS (quick video of the PPS signal LED), and frequency output, I've still not yet tested the native serial ports and frequency stability from the oscillator. Living in an urban canyon makes such testing a pain.

Eventually I might also test moving the oscillator, DAC & reference into a mini oven to see if a custom OCXO would be any better, if small & well insulated enough the power cost of an oven shouldn't be a problem.

Also as you'll see if you look at the tweets, I really should have posted this almost a month ago, however I finished fixing the board just before heading off to California for a work trip, and whilst I meant to write this post during the trip, it's not until I've been back for more than a week that I've gotten to it. I find it extremely easy to let myself be distracted from side projects, particularly since I'm in a busy period at $ORK at the moment.

,

Jonathan AdamczewskiModern C++ Randomness

This thread happened…

So I did a little digging to satisfy my own curiosity about the “modern C++” version, and have learned a few things that I didn’t know previously…

(this is a manual unrolled twitter thread that starts here, with slight modifications)

Nearly all of this I gleaned from the invaluable and . Comments about implementation refer specifically to the gcc-8.1 C++ standard library, examined using Compiler Explorer and the -E command line option.

std::random_device is a platform-specific source of entropy.

std: mt19937 is a parameterized typedef of std::mersenne_twister_engine

specifically:
std::mersenne_twister_engine<uint_fast32_t, 32, 624, 397, 31, 0x9908b0df, 11, 0xffffffff, 7, 0x9d2c5680, 15, 0xefc60000, 18, 1812433253>
(What do those number mean? I don’t know.)

And std::uniform_int_distribution produces uniformly distributed random numbers over a specified range, from a provided generator.

The default constructor for std::random_device takes an implementation-defined argument, with a default value.

The meaning of the argument is implementation-defined – but the type is not: std::string. (I’m not sure why a dynamically modifiable string object was the right choice to be the configuration parameter for an entropy generator.)

There are out-of-line private functions for much of this implementation of std::random_device. The constructor that calls the out-of-line init function is itself inline – so the construction and destruction of the default std::string param is also generated inline.

Also, peeking inside std::random_generator, there is a union with two members:

void* _M_file, which I guess would be used to store a file handle for /dev/urandom or similar.

std::mt19937 _M_mt, which is a … parameterized std::mersenne_twister_engine object.

So it seems reasonable to me that if you can’t get entropy* from outside your program, generate your own approximation. It looks like it is possible that the entropy for the std::mersenne_twister_engine will be provided by a std::mersenne_twister_engine.

Unlike std::random_device, which has its implementation out of line, std::mersenne_twister_engine‘s implementation seems to be all inline. It is unclear what benefits this brings, but it results in a few hundred additional instructions generated.

And then there’s std::uniform_int_distribution, which seems mostly unsurprising. It is again fully inline, which (from a cursory eyeballing) may allow a sufficiently insightful compiler to avoid a couple of branches and function calls.

The code that got me started on this was presented in jest – but (std::random_device + std::mt19937 + std::uniform_int_distribution) is a commonly recommended pattern for generating random numbers using these modern C++ library features.

My takeaways:
std::random_device is potentially very expensive to use – and doesn’t provide strong cross-platform guarantees about the randomness it provides. It is configured with an std::string – the meaning of which is platform dependent. I am not compelled to use this type.

std::mt19937 adds a sizeable chunk of codegen via its inline implementation – and there are better options than Mersenne Twister.

Bottom line: I’m probably going to stick with rand(), and if I need something a little fancier,  or one of the other suggestions provided as replies to the twitter thread.

Addition: the code I was able to gather, representing some relevant parts

,

Clinton RoyActively looking for work

I am now actively looking for work, ideally something with Unix/C/Python in the research/open source/not-for-proft space. My long out of date resume has been updated.

,

Arjen LentzThe trouble with group labels. Because.

So Australia’s long-term accepted refugee detainees on Manus and Nauru will not be able to migrate to the US, if they come from a country that’s on Trump’s list.  So anyone from a particular set of countries is classified as bad.  Because Muslim.

paper boatFundamentally of course, these fellow humans should not be detained at all. They have been accepted as genuine refugees by the Australian immigration department.  Their only “crime”, which is not actually a crime by either Australian or International law, was to arrive by boat.  They are now, as a group, abused as deterrent marketing material.  Anyone coming by boat to Australia is classified as bad.  Because boat (although stats actually indicate that it might correlate better with skin colour).

My grandfather, a surgeon at a Berlin hospital, lost his job on the same day that Hitler was elected.  Since not even decrees move that quickly and the exclusion of jews from  particular professions happened gradually over the 1930s, we can only assume that the hospital management contained some overzealous individuals, who took initiatives that they thought would be looked on favourably by their new superiors. Because Jew.

The Berlin events are a neat example of how a hostile atmosphere enables certain behaviour – of course, if you’d asked the superiors, they didn’t order any such thing and it had nothing to do with them. Sound familiar?

Via jiggly paths, my family came to the UK.  Initially interned.  Because German.

My grandfather, whom as I mentioned was an accomplished surgeon, had to re-do all his medical education in Britain – during that period he was separated from his wife and two young daughters (Scotland vs the south of England – a long way in the 1930s).  Because [the qualifications and experience] not British.  Possibly also Because German.

Hassles with running a practice, and being allowed a car.  Because German.

Then allowed those things.  Because Useful.  He was still a German Jew though….

I mentioned this, because other refugees at the time would have had the same rules applied, but my grandfather had the advantage of his profession and thus being regarded as useful.  Others would not have had that benefit.  This means that people were being judged “worthy” merely based on their immediate usefulness to the local politics of the day, not for being a fellow human being in need, or any other such consideration.

Group Labels and Value Judgements

Any time we classify some group as more or less worthy than another group or set of groups, trouble will follow – both directly, and indirectly.  Every single time.  And the trouble will affect everybody, it’s not selective to some group(s).  Also every time.  Historically verifiable.

brown eyes - blue eyesPopulists simplify, for political gain.  Weak leaders pander to extreme elements, in the hope that it will keep them in power.

The method of defining groups doesn’t matter, nor does one need to add specific “instructions”. The “Blue Eye experiment” proved this. The nasties are initiated merely through a “simple” value judgement of one group vs another.  That’s all that’s required, and the bad consequences are all implied and in a way predetermined.  Trouble will follow, and it’s not going to end well.

Identity

It’s fine to identify as a member of a particular group, or more  likely multiple groups as many factors overlap.  That is part of our identity.  But that’s is not the same as passing a judgement on the relative value of one of those groups vs another group.

  • Brown vs blue eyes
  • Muslim vs Christian vs Jew
  • White vs black “race”
  • One football club vs another

human skullIt really doesn’t matter.  Many of these groupings are entirely arbitrary.  The concept of “race” has no scientific basis, we humans are verifiably a single species.

You can make up any arbitrary classification.  It won’t make a difference.  It doesn’t matter.  The outcomes will be the same.

Given what we know about these dynamics, anyone making such value judgements is culpable.  If they’re in a leadership position, I’d suggest that any utterances in that realm indicate either incompetence or criminal intent. Don’t do that.

Don’t accept it.  Don’t ignore it.  Don’t pander to it.  Don’t vote for it.

Speak up and out against it.  For everybody’s sake.  Because I assure you, every single example in history shows that it comes back to bite everyone.  So even if you don’t really feel a connection with people you don’t know, it’ll come and bite you and yours, too.

It’s rearing its ugly head, again, and if we ignore it it will bite us. Badly. Just like any previous time in history.  Guaranteed.

The post The trouble with group labels. Because. first appeared on Lentz family blog.

,

Julien GoodwinPoE termination board

For my next big project I'm planning on making it run using power over ethernet. Back in March I designed a quick circuit using the TI TPS2376-H PoE termination chip, and an LMR16020 switching regulator to drop the ~48v coming in down to 5v. There's also a second stage low-noise linear regulator (ST LDL1117S33R) to further drop it down to 3.3v, but as it turns out the main chip I'm using does its own 5->3.3v conversion already.

Because I was lazy, and the pricing was reasonable I got these boards manufactured by pcb.ng who I'd used for the USB-C termination boards I did a while back.

Here's the board running a Raspberry Pi 3B+, as it turns out I got lucky and my board is set up for the same input as the 3B+ supplies.



One really big warning, this is a non-isolated supply, which, in general, is a bad idea for PoE. For my specific use case there'll be no exposed connectors or metal, so this should be safe, but if you want to use PoE in general I'd suggest using some of the isolated convertors that are available with integrated PoE termination.

For this series I'm going to try and also make some notes on the mistakes I've made with these boards to help others, for this board:
  • I failed to add any test pins, given this was the first try I really should have, being able to inject power just before the switching convertor was helpful while debugging, but I had to solder wires to the input cap to do that.
  • Similarly, I should have had a 5v output pin, for now I've just been shorting the two diodes I had near the output which were intended to let me switch input power between two feeds.
  • The last, and the only actual problem with the circuit was that when selecting which exact parts to use I optimised by choosing the same diode for both input protection & switching, however this was a mistake, as the switcher needed a Schottky diode, and one with better ratings in other ways than the input diode. With the incorrect diode the board actually worked fine under low loads, but would quickly go into thermal shutdown if asked to supply more than about 1W. With the diode swapped to a correctly rated one it now supplies 10W just fine.
  • While debugging the previous I also noticed that the thermal pads on both main chips weren't well connected through. It seems the combination of via-in-thermal-pad (even tented), along with Kicad's normal reduction in paste in those large pads, plus my manufacturer's use of a fairly thin application of paste all contributed to this. Next time I'll probably avoid via-in-pad.


Coming soon will be a post about the GPS board, but I'm still testing bits of that board out, plus waiting for some missing parts (somehow not only did I fail to order 10k resistors, I didn't already have some in stock).

,

BlueHackersPost-work: the radical idea of a world without jobs | The Guardian

,

Tim SerongYou won’t find us on Facebook

I made these back in August 2016 (complete with lovingly hand-drawn thumb and middle finger icons), but it seems appropriate to share them again now. The images are CC-BY-SA, so go nuts, or you can grab them in sticker form from Redbubble.

facebook-thumb-blue

facebook-thumb-black

facebook-finger-blue

facebook-finger-black

,

Arjen LentzMaximising available dynamic memory on Arduino

I like programming in small spaces, it makes one think efficiently and not be sloppy.

Example: I was looking at an embedded device the other day, and found a complete Tomcat server and X desktop running on a Raspberry Pi type environment.  So that kind of stuff makes me shudder, it just looks wrong and to me it shows that whoever put that together (it was a company product, not just a hobby project) is not really “thinking” embedded.  Yes we have more powerful embedded CPUs available now, and more memory, but that all costs power.  So really: code efficiency = power efficiency!

Back to Arduino, a standard Arduino Uno type board (Atmel ATMEGA 328P) has 32K of program storage space, and 2K of dynamic memory.  The latter is used for runtime variables and the stack, so you want to be sure you always have enough spare there, particularly when using some libraries that need a bit of working space.

A bit of background… while in “big” CPUs the program and variable memory space is generally shared, it’s often separated in microcontrollers.  This makes sense considering the architecture: the program code is flashed and doesn’t change, while the variable space needs to be written at runtime.

The OneWire library, used to interface with for instance Maxim one-wire sensors, applies a little lookup table to do its CRC (checksum) calculation.  In this case it’s optimising for time, as using a table lookup CRC is much faster than just calculating for each byte.  But, it does optimise the table, by using the PROGMEM modifier when declaring the static array.  It can be optimised further still, see my post on reducing a CRC lookup table from 256 entries to 32 entries (I submitted a patch and pull request).  What PROGMEM does is tell the compiler to put the array in program storage rather than the dynamic variable space.

Arduino uses the standard GNU C++ compiler to cross-compile for the Atmel chips (that form the core of most Arduino boards), and this compiler “normally” just puts any variable (even if “const” – constant) with the rest of the variables.  This way of organising is perfectly sensible except with these microcontrollers.  So this is why we need to give the compiler a hint that certain variables can be placed in program memory instead!

Now consider this piece of code:

Serial.println("Hello, world!");

Where does the “Hello, world!” string get stored by default?  In the variable space!

I hadn’t really thought about that, until I ran short of dynamic memory in the controller of my hot water system.  The issue actually manifested itself by causing incorrect temperature readings, and sometimes crashing.  When looking at it closer, it became clear that the poor thing was running out of memory and overwriting other stuff or otherwise just getting stuck.  The controller used is an Freetronics EtherTen, which is basically an Arduino Uno with an Ethernet breakout integrated on the same board.  Using Ethernet with a Uno gets really tight, but it can be very beneficial: the controller is powered through 802.3f PoE, and communicates using UDP packets.

I knew that I wasn’t actually using that many actual variables, but was aware that the Ethernet library does require a fair bit of space (unfortunately the “how much” is not documented, so I was just running on a “as much as possible”).  Yet after compiling I was using in the range of 1K of the dynamic variable space, which just looked like too much.  So that’s when I started hunting, thought it over in terms of how I know compilers think, and then found the little note near the bottom of the PROGMEM page, explaining that you can use the F() macro on static strings to also place them in program storage. Eureka!

Serial.println(F("Hello, world!"));

When you compile an Arduino sketch, right at the end you get to see how much memory is used. Below is the output on a tiny sketch that only prints the hello world as normal:

Sketch uses 1374 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 202 bytes (9%) of dynamic memory, leaving 1846 bytes for local variables. Maximum is 2048 bytes.

If you use the F() modifier for this example, you get:

Sketch uses 1398 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 188 bytes (9%) of dynamic memory, leaving 1860 bytes for local variables. Maximum is 2048 bytes.

The string is 13 bytes long plus its ‘\0’ terminator, and indeed we see that the 14 bytes have shifted from dynamic memory to program storage. Great score, because we know that that string can’t be modified anyway so we have no reason to have it in dynamic variable space.  It’s really a constant.

Armed with that new wisdom I went through my code and changed the serial monitoring strings to use the F() modifier.  That way I “recovered” well over 200 bytes of dynamic variable space, which should provide the Ethernet library with plenty of room!

Now, you may wonder, “why doesn’t he just use an #ifdef to remove the serial debugging code?”.  I could do that, but then I really have no way to easily debug the system on this hardware platform, as it wouldn’t work with the debugging code enabled (unless the Ethernet library is not active).  So that wouldn’t be a winner.  This approach works, and now with the extra knowledge of F() for string constants I have enough space to have the controller do everything it needs to.

Finally, while I did an OSDC conference talk on this solar hot water controller (video on this the PV system in my home) some years ago already, I realise I hadn’t actually published the code.  Done now, arjenlentz/HomeResourceMonitor on GitHub.  It contains some hardcoded stuff (mostly #define macros) for our family situation, but the main story is that the automatic control of the boost saves us a lot of money.  And of course, the ability to see what goes on (feedback) is an important factor for gaining understanding, which in turn leads to adjusting behaviour.

The post Maximising available dynamic memory on Arduino first appeared on Lentz family blog.

,

Craig Sandersbrawndo-installer

Tired of being oppressed by the slack-arse distro package maintainers who waste time testing that new versions don’t break anything and then waste even more time integrating software into the system?

Well, so am I. So I’ve fixed it, and it was easy to do. Here’s the ultimate installation tool for any program:

brawndo() {
   curl $1 | sudo /usr/bin/env bash -
}

I’ve never written a shell script before in my entire life, I spend all my time writing javascript or ruby or python – but shell’s not a real language so it can’t be that hard to get right, can it? Of course not, and I just proved it with the amazing brawndo installer (It’s got what users crave – it’s got electrolyes!)

So next time some lame sysadmin recommends that you install the packaged version of something, just ask them if apt-get or yum or whatever loser packaging tool they’re suggesting has electrolytes. That’ll shut ’em up.

brawndo-installer is a post from: Errata

,

Tim SerongStrange Bedfellows

The Tasmanian state election is coming up in a week’s time, and I’ve managed to do a reasonable job of ignoring the whole horrible thing, modulo the promoted tweets, the signs on the highway, the junk the major (and semi-major) political parties pay to dump in my letterbox, and occasional discussions with friends and neighbours.

Promoted tweets can be blocked. The signs on the highway can (possibly) be re-purposed for a subsequent election, or can be pulled down and used for minor windbreak/shelter works for animal enclosures. Discussions with friends and neighbours are always interesting, even if one doesn’t necessarily agree. I think the most irritating thing is the letterbox junk; at best it’ll eventually be recycled, at worst it becomes landfill or firestarters (and some of those things do make very satisfying firestarters).

Anyway, as I live somewhere in the wilds division of Franklin, I thought I’d better check to see who’s up for election here. There’s no independents running this time, so I’ve essentially got the choice of four parties; Shooters, Fishers and Farmers Tasmania, Tasmanian Greens, Tasmanian Labor and Tasmanian Liberals (the order here is the same as on the TEC web site; please don’t infer any preference based on the order in which I list parties in this blog post).

I feel like I should be setting party affiliations aside and voting for individuals, but of the sixteen candidates listed, to the best of my knowledge I’ve only actually met and spoken with two of them. Another I noticed at random in a cafe, and I was ignored by a fourth who was milling around with some cronies at a promotional stand out the front of Woolworths in Huonville a few weeks ago. So, party affiliations it is, which leads to an interesting thought experiment.

When you read those four party names above, what things came most immediately to mind? For me, it was something like this:

  • Shooters, Fishers & Farmers: Don’t take our guns. Fuck those bastard Greenies.
  • Tasmanian Greens: Protect the natural environment. Renewable energy. Try not to kill anything. Might collaborate with Labor. Liberals are big money and bad news.
  • Tasmanian Labor: Mellifluous babble concerning health, education, housing, jobs, pokies and something about workers rights. Might collaborate with the Greens. Vehemently opposed to the Liberals.
  • Tasmanian Liberals: Mellifluous babble concerning jobs, health, infrastructure, safety and the Tasmanian way of life, peppered with something about small business and family values. Vehemently opposed to Labor and the Greens.

And because everyone usually automatically thinks in terms of binaries (e.g. good vs. evil, wrong vs. right, one vs. zero), we tend to end up imagining something like this:

  • Shooters, Fishers & Farmers vs. Greens
  • Labor vs. Liberal
  • …um. Maybe Labor and the Greens might work together…
  • …but really, it’s going to be Labor or Liberal in power (possibly with some sort of crossbench or coalition support from minor parties, despite claims from both that it’ll be majority government all the way).

It turns out that thinking in binaries is remarkably unhelpful, unless you’re programming a computer (it’s zeroes and ones all the way down), or are lost in the wilderness (is this plant food or poison? is this animal predator or prey?) The rest of the time, things tend to be rather more colourful (or grey, depending on your perspective), which leads back to my thought experiment: what do these “naturally opposed” parties have in common?

According to their respective web sites, the Shooters, Fishers & Farmers and the Greens have many interests in common, including agriculture, biosecurity, environmental protection, tourism, sustainable land management, health, education, telecommunications and addressing homelessness. There are differences in the policy details of course (some really are diametrically opposed), but in broad strokes these two groups seem to care strongly about – and even agree on – many of the same things.

Similarly, Labor and Liberal are both keen to tell a story about putting the people of Tasmania first, about health, education, housing, jobs and infrastructure. Honestly, for me, they just kind of blend into one another; sure there’s differences in various policy details, but really if someone renamed them Labal and Liberor I wouldn’t notice. These two are the status quo, and despite fighting it out with each other repeatedly, are, essentially, resting on their laurels.

Here’s what I’d like to see: a minority Tasmanian state government formed from a coalition of the Tasmanian Greens plus the Shooters, Fishers & Farmers party, with the Labor and Liberal parties together in opposition. It’ll still be stuck in that irritating Westminster binary mode, but at least the damn thing will have been mixed up sufficiently that people might actually talk to each other rather than just fighting.

,

Jonathan AdamczewskiWatch as the OS rewrites my buggy program.

I didn’t know that SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT) was a thing, until I wrote a bad test that wouldn’t crash.

Digging into it, I found that a movaps instruction was being rewritten as movups, which was a thoroughly confusing thing to see.

The one clue I had was that a fault due to an unaligned load had been observed in non-test code, but did not reproduce when written as a test using the google-test framework. A short hunt later (including a failed attempt at writing a small repro case), I found an explanation: google test suppresses this class of failure.

The code below will successfully demonstrate the behavior, printing out the SIMD load instruction before and after calling the function with an unaligned pointer.

[Gist]

View the code on Gist.

,

,

Jonathan AdamczewskiPriorities for my team

(unthreaded from here)

During the day, I’m a Lead of a group of programmers. We’re responsible for a range of tools and tech used by others at the company for making games.

I have a list of the my priorities (and some related questions) of things that I think are important for us to be able to do well as individuals, and as a team:

  1. Treat people with respect. Value their time, place high value on their well-being, and start with the assumption that they have good intentions
    (“People” includes yourself: respect yourself, value your own time and well-being, and have confidence in your good intentions.)
  2. When solving a problem, know the user and understand their needs.
    • Do you understand the problem(s) that need to be solved? (it’s easy to make assumptions)
    • Have you spoken to the user and listened to their perspective? (it’s easy to solve the wrong problem)
    • Have you explored the specific constraints of the problem by asking questions like:
      • Is this part needed? (it’s easy to over-reach)
      • Is there a satisfactory simpler alternative? (actively pursue simplicity)
      • What else will be needed? (it’s easy to overlook details)
    • Have your discussed your proposed solution with users, and do they understand what you intend to do? (verify, and pursue buy-in)
    • Do you continue to meet regularly with users? Do they know you? Do they believe that you’re working for their benefit? (don’t under-estimate the value of trust)
  3. Have a clear understanding of what you are doing.
    • Do you understand the system you’re working in? (it’s easy to make assumptions)
    • Have you read the documentation and/or code? (set yourself up to succeed with whatever is available)
    • For code:
      • Have you tried to modify the code? (pull a thread; see what breaks)
      • Can you explain how the code works to another programmer in a convincing way? (test your confidence)
      • Can you explain how the code works to a non-programmer?
  4. When trying to solve a problem, debug aggressively and efficiently.
    • Does the bug need to be fixed? (see 1)
    • Do you understand how the system works? (see 2)
    • Is there a faster way to debug the problem? Can you change code or data to cause the problem to occur more quickly and reliably? (iterate as quickly as you can, fix the bug, and move on)
    • Do you trust your own judgement? (debug boldly, have confidence in what you have observed, make hypotheses and test them)
  5. Pursue excellence in your work.
    • How are you working to be better understood? (good communication takes time and effort)
    • How are you working to better understand others? (don’t assume that others will pursue you with insights)
    • Are you responding to feedback with enthusiasm to improve your work? (pursue professionalism)
    • Are you writing high quality, easy to understand, easy to maintain code? How do you know? (continue to develop your technical skills)
    • How are you working to become an expert and industry leader with the technologies and techniques you use every day? (pursue excellence in your field)
    • Are you eager to improve (and fix) systems you have worked on previously? (take responsibility for your work)

The list was created for discussion with the group, and as an effort to articulate my own expectations in a way that will help my team understand me.

Composing this has been useful exercise for me as a lead, and definitely worthwhile for the group. If you’ve never tried writing down your own priorities, values, and/or assumptions, I encourage you to try it :)

,

Jonathan AdamczewskiA little bit of floating point in a memory allocator — Part 2: The floating point

[Previously]

This post contains the same material as this thread of tweets, with a few minor edits.

In IEEE754, floating point numbers are represented like this:

±2ⁿⁿⁿ×1.sss…

nnn is the exponent, which is floor(log2(size)) — which happens to be the fl value computed by TLSF.

sss… is the significand fraction: the part that follows the decimal point, which happens to be sl.

And so to calculate fl and sl, all we need to do is convert size to a floating point value (on recent x86 hardware, that’s a single instruction). Then we can extract the exponent, and the upper bits of the fractional part, and we’re all done :D

That can be implemented like this:

double sf = (int64_t)size;
uint64_t sfi;
memcpy(&sfi, &sf, 8);
fl = (sfi >> 52) - (1023 + 7);
sl = (sfi >> 47) & 31;

There’s some subtleties (there always is). I’ll break it down…

double sf = (int64_t)size;

Convert size to a double, with an explicit cast. size has type size_t, but using TLSF from github.com/mattconte/tlsf, the largest supported allocation on 64bit architecture is 2^32 bytes – comfortably less than the precision provided by the double type. If you need your TLSF allocator to allocate chunks bigger than 2^53, this isn’t the technique for you :)

I first tried using float (not double), which can provide correct results — but only if the rounding mode happens to be set correctly. double is easier.

The cast to (int64_t) results in better codegen on x86: without it, the compiler will generate a full 64bit unsigned conversion, and there is no single instruction for that.

The cast tells the compiler to (in effect) consider the bits of size as if they were a two’s complement signed value — and there is an SSE instruction to handle that case (cvtsi2sdq or similar). Again, with the implementation we’re using size can’t be that big, so this will do the Right Thing.

uint64_t sfi;
memcpy(&sfi, &sf, 8);

Copy the 8 bytes of the double into an unsigned integer variable. There are a lot of ways that C/C++ programmers copy bits from floating point to integer – some of them are well defined :) memcpy() does what we want, and any moderately respectable compiler knows how to select decent instructions to implement it.

Now we have floating point bits in an integer register, consisting of one sign bit (always zero for this, because size is always positive), eleven exponent bits (offset by 1023), and 52 bits of significant fraction. All we need to do is extract those, and we’re done :)

fl = (sfi >> 52) - (1023 + 7);

Extract the exponent: shift it down (ignoring the always-zero sign bit), subtract the offset (1023), and that 7 we saw earlier, at the same time.

sl = (sfi >> 47) & 31;

Extract the five most significant bits of the fraction – we do need to mask out the exponent.

And, just like that*, we have mapping_insert(), implemented in terms of integer -> floating point conversion.

* Actual code (rather than fragments) may be included in a later post…

Jonathan AdamczewskiA little bit of floating point in a memory allocator — Part 1: Background

This post contains the same material as this thread of tweets, with a few minor edits.

Over my holiday break at the end of 2017, I took a look into the TLSF (Two Level Segregated Fit) memory allocator to better understand how it works. I’ve made use of this allocator and have been impressed by its real world performance, but never really done a deep dive to properly understand it.

The mapping_insert() function is a key part of the allocator implementation, and caught my eye. Here’s how that function is described in the paper A constant-time dynamic storage allocator for real-time systems:

I’ll be honest: from that description, I never developed a clear picture in my mind of what that function does.

(Reading it now, it seems reasonably clear – but I can say that only after I spent quite a bit of time using other methods to develop my understanding)

Something that helped me a lot was by looking at the implementation of that function from github.com/mattconte/tlsf/.  There’s a bunch of long-named macro constants in there, and a few extra implementation details. If you collapse those it looks something like this:

void mapping_insert(size_t size, int* fli, int* sli)
{ 
  int fl, sl;
  if (size < 256)
  {
    fl = 0;
    sl = (int)size / 8;
  }
  else
  {
    fl = fls(size);
    sl = (int)(size >> (fl - 5)) ^ 0x20;
    fl -= 7;
  }
  *fli = fl;
  *sli = sl;
}

It’s a pretty simple function (it really is). But I still failed to *see* the pattern of results that would be produced in my mind’s eye.

I went so far as to make a giant spreadsheet of all the intermediate values for a range of inputs, to paint myself a picture of the effect of each step :) That helped immensely.

Breaking it down…

There are two cases handled in the function: one for when size is below a certain threshold, and on for when it is larger. The first is straightforward, and accounts for a small number of possible input values. The large size case is more interesting.

The function computes two values: fl and sl, the first and second level indices for a lookup table. For the large case, fl (where fl is “first level”) is computed via fls(size) (where fls is short for “find last set” – similar names, just to keep you on your toes).

fls() returns the index of the largest bit set, counting from the least significant slbit, which is the index of the largest power of two. In the words of the paper:

“the instruction fls can be used to compute the ⌊log2(x)⌋ function”

Which is, in C-like syntax: floor(log2(x))

And there’s that “fl -= 7” at the end. That will show up again later.

For the large case, the computation of sl has a few steps:

  sl = (size >> (fl – 5)) ^ 0x20;

Depending on shift down size by some amount (based on fl), and mask out the sixth bit?

(Aside: The CellBE programmer in me is flinching at that variable shift)

It took me a while (longer than I would have liked…) to realize that this
size >> (fl – 5) is shifting size to generate a number that has exactly six significant bits, at the least significant end of the register (bits 5 thru 0).

Because fl is the index of the most significant bit, after this shift, bit 5 will always be 1 – and that “^ 0x20” will unset it, leaving the result as a value between 0 and 31 (inclusive).

So here’s where floating point comes into it, and the cute thing I saw: another way to compute fl and sl is to convert size into an IEEE754 floating point number, and extract the exponent, and most significant bits of the mantissa. I’ll cover that in the next part, here.

,

Tim SerongHackweek0x10: Fun in the Sun

We recently had a 5.94KW solar PV system installed – twenty-two 270W panels (14 on the northish side of the house, 8 on the eastish side), with an ABB PVI-6000TL-OUTD inverter. Naturally I want to be able to monitor the system, but this model inverter doesn’t have an inbuilt web server (which, given the state of IoT devices, I’m actually kind of happy about); rather, it has an RS-485 serial interface. ABB sell addon data logger cards for several hundred dollars, but Rick from Affordable Solar Tasmania mentioned he had another client who was doing monitoring with a little Linux box and an RS-485 to USB adapter. As I had a Raspberry Pi 3 handy, I decided to do the same.

Step one: Obtain an RS-485 to USB adapter. I got one of these from Jaycar. Yeah, I know I could have got one off eBay for a tenth the price, but Jaycar was only a fifteen minute drive away, so I could start immediately (I later discovered various RS-485 shields and adapters exist specifically for the Raspberry Pi – in retrospect one of these may have been more elegant, but by then I already had the USB adapter working).

Step two: Make sure the adapter works. It can do RS-485 and RS-422, so it’s got five screw terminals: T/R-, T/R+, RXD-, RXD+ and GND. The RXD lines can be ignored (they’re for RS-422). The other three connect to matching terminals on the inverter, although what the adapter labels GND, the inverter labels RTN. I plugged the adapter into my laptop, compiled Curt Blank’s aurora program, then asked the inverter to tell me something about itself:

aurora -a 2 -Y 4 -e /dev/ttyUSB0Interestingly, the comms seem slightly glitchy. Just running aurora -a 2 -e /dev/ttyUSB0 always results in either “No response after 1 attempts” or “CRC receive error (1 attempts made)”. Adding “-Y 4” makes it retry four times, which is generally rather more successful. Ten retries is even more reliable, although still not perfect. Clearly there’s some tweaking/debugging to do here somewhere, but at least I’d confirmed that this was going to work.

So, on to the Raspberry Pi. I grabbed the openSUSE Leap 42.3 JeOS image and dd’d that onto a 16GB SD card. Booted the Pi, waited a couple of minutes with a blank screen while it did its firstboot filesystem expansion thing, logged in, fiddled with network and hostname configuration, rebooted, and then got stuck at GRUB saying “error: attempt to read or write outside of partition”:

error: attempt to read or write outside of partition.

Apparently that’s happened to at least one other person previously with a Tumbleweed JeOS image. I fixed it by manually editing the partition table.

Next I needed an RPM of the aurora CLI, so I built one on OBS, installed it on the Pi, plugged the Pi into the USB adapter, and politely asked the inverter to tell me a bit more about itself:

aurora -a @ -Y 4 -d 0 /dev/ttyUSB0

Everything looked good, except that the booster temperature was reported as being 4294967296°C, which seemed a little high. Given that translates to 0x100000000, and that the south wall of my house wasn’t on fire, I rather suspected another comms glitch. Running aurora -a 2 -Y 4 -d 0 /dev/ttyUSB0 a few more times showed that this was an intermittent problem, so it was time to make a case for the Pi that I could mount under the house on the other side of the wall from the inverter.

I picked up a wall mount snap fit black plastic box, some 15mm x 3mm screws, matching nuts, and 9mm spacers. The Pi I would mount inside the box part, rather than on the back, meaning I can just snap the box-and-Pi off the mount if I need to bring it back inside to fiddle with it.

Then I had to measure up and cut holes in the box for the ethernet and USB ports. The walls of the box are 2.5mm thick, plus 9mm for the spacers meant the bottom of the Pi had to be 11.5mm from the bottom of the box. I measured up then used a Dremel tool to make the holes then cleaned them up with a file. The hole for the power connector I did by eye later after the board was in about the right place.

20171115_164538 20171115_165407 20171115_165924 20171115_172026 20171115_173200 20171115_174705 20171115_174822 20171115_175002

I didn’t measure for the screw holes at all, I simply drilled through the holes in the board while it was balanced in there, hanging from the edge with the ports. I initially put the screws in from the bottom of the box, dropped the spacers on top, slid the Pi in place, then discovered a problem: if the nuts were on top of the board, they’d rub up against a couple of components:

20171115_180310

So I had to put the screws through the board, stick them there with Blu Tack, turn the Pi upside down, drop the spacers on top, and slide it upwards into the box, getting the screws as close as possible to the screw holes, flip the box the right way up, remove the Blu Tack and jiggle the screws into place before securing the nuts. More fiddly than I’d have liked, but it worked fine.

One other kink with this design is that it’s probably impossible to remove the SD card from the Pi without removing the Pi from the box, unless your fingers are incredibly thin and dexterous. I could have made another hole to provide access, but decided against it as I’m quite happy with the sleek look, this thing is going to be living under my house indefinitely, and I have no plans to replace the SD card any time soon.

20171115_18265520171115_192923

All that remained was to mount it under the house. Here’s the finished install:

20171116_115413

After that, I set up a cron job to scrape data from the inverter every five minutes and dump it to a log file. So far I’ve discovered that there’s enough sunlight by about 05:30 to wake the inverter up. This morning we’d generated 1KW by 08:35, 2KW by 09:10, 8KW by midday, and as I’m writing this at 18:25, a total of 27.134KW so far today.

Next steps:

  1. Figure out WTF is up with the comms glitches
  2. Graph everything and/or feed the raw data to pvoutput.org

,

Clinton RoyAccess and Memory: Open GLAM and Open Source

Over the years of my involvement with library projects, like Coder Dojo, programming workshops and such, I’ve struggled to nail down the intersection between libraries and open source. At this years linux.conf.au in Sydney (my seventeenth!) I’m helping to put together a miniconf to answer this question: Open GLAM. If you do work in the intersection of galleries, libraries, archives, musuems and open source, we’d love to hear from you.

,

Arjen LentzUsing expired credit cards

Using expired credit/debit cards… surely you can’t do that?  Actually, yes you can. This is how it goes.

First, what can be observed (verified):

On a vendor site that allows your to save your card (hopefully via a token with the payment gateway provider, so it doesn’t actually store your card), you enter the card number and expiry date for your then still valid card.  This is necessary because otherwise the site is likely to reject your input.  Makes sense.

Some time later your card expires, but the vendor is still quite happy to keep using the card-on-file for recurring payments.  The payment gateway apparently doesn’t mind, and our banks apparently don’t mind.  I have observed this effect with Suncorp and Bank of Queensland, please let me know if you’ve observed this with other banks.

From this point, let’s play devil’s advocate.

  • What if someone got hold of your card number+expiry date?
    • Well, sites tend to reject dates-in-the-past on input. Excellent.
  • What if that someone just does +4 on the year and then enters it – renewed cards tend to have the same number, just with an updated expiry 4 years in to the future (the exact number of years may differ between banks) ?
    • Payment gateway should reject the card, because even though the card+expiry is “ok”, the CVV (Card Verification Value, the magic number on the back of the card) would be different!  Nice theory, but…
      • I’ve noted that some sites don’t ask for the CVV, and thus we must conclude at at least some payment gateways don’t require it.  Eek!
        I noticed that the payment gateway for one of these was Westpac.

So what are the underlying issues:

  • Banks let through payments on expired cards.
    • Probably done for client convenience (otherwise you’d be required to update lots of places).
  • Banks issue new cards with the same card number but just an updated year (even the month tends to be the same).
    • Possibly convenience again, but if you need to update your details anyway with some vendor, you might as well update a few more numbers.  I don’t see a valid reason to do this (please comment if you think of something).
  • Some payment gateways don’t require CVV to let through a payment.
    • This is inexcusable and means that the above two habits result in a serious fraud vector.  Payment gateways, credit card companies and banks should not allow this at all, yet somehow it goes through the gateway -> credit card company path without getting rejected.

Security tends to involve multiple layers.  This makes sense, as any one layer can be compromised.  When a security aspect is procedurally compromised, such as not regarding an expired card as expired, or not requiring the o-so-important CVV number for online payments, it’s the vendor itself undoing their security.  If that happens with a few layers, as in the above scenario, security is fatally impacted.  A serious failing.

I have little doubt that people have been using this fraud vector some time as it’s unlikely that I’m the first one spotting this.  In many scenarios, credit card companies tend to essentially weigh security risks against convenience, and refund those affected.  This is what happens with abuse of the PayWave system, and while I don’t really like it, I understand why they do this.  But I also think we have to draw the line somewhere.  Not requiring CVV numbers for online transactions is definitely beyond. Possibly renewing cards with the same number also.  And as it’s the combination of these factors that causes the problem, addressing any one of them could plug the hole – addressing more than one would be great.

The post Using expired credit cards first appeared on Lentz family blog.

,

Arjen LentzCalculating CRC with a tiny (32 entry) lookup-table

I happened to notice that the Arduino OneWire library uses a 256 entry lookup table for its CRC calculations.  I did some research on this topic in 1992-1993, while working on Bulletin Board Systems, FidoNet code and file transfer protocols.  These days memory is not at a premium on most computers, however on Arduino and microcontroller environments it definitely is, and I happen to know that table-lookup CRC can be done using two 16-entry tables!  So I’ve dug up my documentation and code from the time, and applied it to the CRC-8 calculation for the Maxim (Dallas Semiconductor) OneWire bus protocol.  I think this provides a neat trade-off between code size and speed.

License

For any of the below code, apply the following license (2-clause “simplified” BSD license), which should suffice for any use.  If you do require another license, just ask.

CRC lookup tables, generator and use implementation by Arjen Lentz <arjen (at) lentz (dot) com (dot) au>

Copyright (C) 1992-2017 Arjen Lentz

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The Code

The below is a drop-in replacement for the chunk of code in OneWire.cpp, the two 16-entry tables I mentioned above are combined in to a single 32 entry array.

Copyright (c) 2007, Jim Studt  (original old version - many contributors since)

The latest version of this library may be found at:
  http://www.pjrc.com/teensy/td_libs_OneWire.html

OneWire has been maintained by Paul Stoffregen (paul@pjrc.com) since
January 2010.
[...]

// The 1-Wire CRC scheme is described in Maxim Application Note 27:
// "Understanding and Using Cyclic Redundancy Checks with Maxim iButton Products"
//

#if ONEWIRE_CRC8_TABLE
// Dow-CRC using polynomial X^8 + X^5 + X^4 + X^0
static const uint8_t PROGMEM dscrc_table[] = {
    0x00, 0x5E, 0xBC, 0xE2, 0x61, 0x3F, 0xDD, 0x83,
    0xC2, 0x9C, 0x7E, 0x20, 0xA3, 0xFD, 0x1F, 0x41,
    0x00, 0x9D, 0x23, 0xBE, 0x46, 0xDB, 0x65, 0xF8,
    0x8C, 0x11, 0xAF, 0x32, 0xCA, 0x57, 0xE9, 0x74
};

//
// Compute a Dallas Semiconductor 8 bit CRC. These show up in the ROM
// and the registers.
uint8_t OneWire::crc8(const uint8_t *addr, uint8_t len)
{
    uint8_t crc = 0;

    while (len--) {
        crc = *addr++ ^ crc;  // just re-using crc as intermediate
        crc = pgm_read_byte(dscrc_table + (crc & 0x0f)) ^
              pgm_read_byte(dscrc_table + 16 + ((crc >> 4) & 0x0f));
    }
    return crc;
}
#else
[...]

Generating CRC tables

CRC tables are always only presented as magic output, but the feedback terms simply represent the results of eight shifts/xor operations for all combinations of data and CRC register values. Actually, the table generator routine uses the old-style method shifting each bit.

Below is the rough code. For the Dow-CRC, the specified polynomial is X^8 + X^5 + X^4 + X^0. The first term tells us that it’s an 8 bit CRC. We work the rest in to the designate bits but from right to left, so we end up with binary 1000 1100 which is 0x8C in hex. This is the POLY value in the code.

for (i = 0; i <= 15; i++) {
    crc = i;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

for (i = 0; i <= 15; i++) {
    crc = i << 4;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ poly;
        else
            crc >>= 1;
    }
    crctab[16 + i] = crc;
}

If you want to generate a 256-entry table for the same:

for (i = 0; i <= 255; i++) {
    crc = i;
    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

Yes, this trickery basically works for any CRC (any number of bits), you just need to put in the correct polynomial.  However, some CRCs are “up side down”, and thus require a bit of adjustment along the way – the CRC-16 as used by Xmodem and Zmodem worked that way, but I won’t bother you with that old stuff here as it’s not really used any more. Some CRCs also use different initialisation values, post-processing and check approaches. If you happen to need info on that, drop me a line as the info contained in this blog post is just a part of the docu I have here.

CRC – Polynomial Division

To calculate a CRC (Cyclic Redundancy Check) of a block of data, the data bits are considered to be the coefficients of a polynomial. This message (data) polynomial is first multiplied by the highest term in the polynomial (X^8, X^16 or X^32) then divided by the generator polynomial using modulo two arithemetic. The remainder left after the division is the desired CRC.

CRCs are usually expressed as an polynomial expression such as:

X^16 + X^12 + X^5 + 1

The generator polynomial number is determined by setting bits corresponding to the power terms in the polynomial equation in an (unsigned) integer. We do this backwards and put the highest-order term in the the lowest-order bit. The highest term is implied (X^16 here, just means its’s a 16-bit CRC), the LSB is the X^15 term (0 here), the X^0 term (shown as + 1) results in the MSB being 1.

Note that the usual hardware shift register implementation shifts bits into the lowest-order term. In our implementation, that means shifting towards the right. Why do we do it this way? Because the calculated CRC must be transmitted across a serial link from highest- to lowest-order term. UARTs transmit characters in order from LSB to MSB. By storing the CRC this way, we hand it to the UART in the order low-byte to high-byte; the UART sends each low-bit to high-bit; and the result is transmission bit by bit from highest- to lowest-order term without requiring any bit shuffling.

Credits

Stuff found on BBS, numerous files with information and implementations.
Comments in sources on CRC-32 calculation by Gary S. Brown.
All other people hereby credited anonymously, because unfortunately the documents and sources encountered did not contain the name of a specific person or organisation. But, each source provided a piece of the puzzle!

Sources / Docs

My original source code and full docu is also available: crc_agl.zip, as part of my archive of Lentz Software-Development‘s work in the 1990s.

The post Calculating CRC with a tiny (32 entry) lookup-table first appeared on Lentz family blog.

,

Andrew RuthvenMissing opkg status file on LEDE...

I tried to install LEDE on my home router which is running LEDE, only to be told that libc wasn't installed. Huh? What's going on?! It looked to all intents as purposes as though libc wasn't installed. And it looked like nothing was installed.

What to do if opkg list-installed is returning nothing?

I finally tracked down the status file it uses as being /usr/lib/opkg/status. And it was empty. Oh dear.

Fortunately the info directory had content. This means we can rebuild the status file. How? This is what I did:

cd /usr/lib/opkg/info
for x in *.list; do
pkg=$(basename $x .list)
echo $pkg
opkg info $pkg | sed 's/Status: .*$/Status: install ok installed/' >> ../status
done

And then for the special or virtual packages (such as libc and the kernel):

for x in *.control; do
pkg=$(basename $x .control)
if ! grep -q "Package: $pkg" ../status
then
echo $pkg is missing; cat $x >> ../status
fi
done

I then had to edit the file tidy up some newlines for the kernel and libc, and set the status lines correctly. I used "install hold installed".

Now I that I've shaved that yak, I can install tcpdump to try and work out why a VoIP phone isn't working. Joy.

,

Andrew RuthvenNetwork boot a Raspberry Pi 3

I found to make all this work I had to piece together a bunch of information from different locations. This fills in some of the blanks from the official Raspberry Pi documentation. See here, here, and here.

Image

Download the latest raspbian image from https://www.raspberrypi.org/downloads/raspbian/ and unzip it. I used the lite version as I'll install only what I need later.

To extract the files from the image we need to jump through some hoops. Inside the image are two partitions, we need data from each one.

 # Make it easier to re-use these instructions by using a variable
 IMG=2017-04-10-raspbian-jessie-lite.img
 fdisk -l $IMG

You should see some output like:

 Disk 2017-04-10-raspbian-jessie-lite.img: 1.2 GiB, 1297862656 bytes, 2534888 sectors
 Units: sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disklabel type: dos
 Disk identifier: 0x84fa8189
 
 Device                               Boot Start     End Sectors  Size Id Type
 2017-04-10-raspbian-jessie-lite.img1       8192   92159   83968   41M  c W95 FAT32 (LBA)
 2017-04-10-raspbian-jessie-lite.img2      92160 2534887 2442728  1.2G 83 Linux

You need to be able to mount both the boot and the root partitions. Do this by tracking the offset of each one and multiplying it by the sector size, which is given on the line saying "Sector size" (typically 512 bytes), for example with the 2017-04-01 image, boot has an offset of 8192, so I mount it like this (it is VFAT):

 mount -v -o offset=$((8192 * 512)) -t vfat $IMG /mnt
 # I then copy the data off:
 mkdir -p /data/diskless/raspbian-lite-base-boot/
 rsync -xa /mnt/ /data/diskless/raspbian-lite-base-boot/
 # unmount the partition now:
 umount /mnt

Then we do the same for the root partition:

 mount -v -o offset=$((92160 * 512)) -t ext4 $IMG /mnt
 # copy the data off:
 mkdir -p /data/diskless/raspbian-lite-base-root/
 rsync -xa /mnt/ /data/diskless/raspbian-lite-base-root/
 # umount the partition now:
 umount /mnt

DHCP

When I first set this up, I used OpenWRT on my router, and I had to patch /etc/init/dnsmasq to support setting DHCP option 43. As of the writting of this article, a similar patch has been merged, but isn't in a release yet, and, well, there may never be another release of OpenWRT. I'm now running LEDE, and the the good news is it already has the patch merged (hurrah!). If you're still on OpenWRT, then here's the patch you'll need:

https://git.lede-project.org/?p=source.git;a=commit;h=9412fc294995ae2543fabf84d2ce39a80bfb3bd6

This lets you put the following in /etc/config/dnsmasq, this says that any device that uses DHCP and has a MAC issued by the Raspberry PI Foundation, should have option 66 (boot server) and option 43 set as specified. Set the IP address on option 66 to the device that should be used for tftp on your network, if it's the same device that provides DHCP then it isn't required. I had to set the boot server, as my other network boot devices are using a different server (with an older tftpd-hpa, I explain the problem further down).

 config mac 'rasperrypi'
         option mac 'b8:27:eb:*:*:*'
         option networkid 'rasperrypi'
         list dhcp_option '66,10.1.0.253'
         list dhcp_option '43,Raspberry Pi Boot'

tftp

Initially I used a version of tftpd that was too old and didn't support how the RPi tried to discover if it should use the serial number based naming scheme. The version of tftpd-hpa Debian Jessie works just fine. To find out the serial number you'll probably need to increase the logging of tftpd-hpa, do so by editing /etc/default/tftpd-hpa and adding "-v" to the TFTP_OPTIONS option. It can also be useful to watch tcpdump to see the requests and responses, for example (10.1.0.203 is the IP of the RPi I'm working with):

  tcpdump -n -i eth0 host 10.1.0.203 and dst port 69

This was able to tell me the serial number of my RPi, so I made a directory in my tftpboot directory with the same serial number and copied all the boot files into there. I then found that I had to remove the init= portion from the cmdline.txt file I'm using. To ease debugging I also removed quiet. So, my current cmdline.txt contains (newlines entered for clarity, but the file has it all on one line):

idwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/nfs
nfsroot=10.1.0.253:/data/diskless/raspbian-lite-base-root,vers=3,rsize=1462,wsize=1462
ip=dhcp elevator=deadline rootwait hostname=rpi.etc.gen.nz

NFS root

You'll need to export the directories you created via NFS. My exports file has these lines:

/data/diskless/raspbian-lite-base-root	10.1.0.0/24(rw,no_root_squash,sync,no_subtree_check)
/data/diskless/raspbian-lite-base-boot	10.1.0.0/24(rw,no_root_squash,sync,no_subtree_check)

And you'll also want to make sure you're mounting those correctly during boot, so I have in /data/diskless/raspbian-lite-base-root/etc/fstab the following lines:

10.1.0.253:/data/diskless/raspbian-lite-base-root   /       nfs   rw,vers=3       0   0
10.1.0.253:/data/diskless/raspbian-lite-base-boot   /boot   nfs   vers=3,nolock   0   2

Network Booting

Now you can hopefully boot. Unless you into this bug, as I did. Where the RPi will sometimes fail to boot. Turns out the fix, which is mentioned on the bug report, is to put bootcode.bin (and only bootcode.bin) onto an SD card. That'll then load the fixed bootcode, and which will then boot reliably.

,

BlueHackersMental Health Resources for New Dads

Right now, one in seven new fathers experiences high levels of psychological distress and as many as one in ten experience depression or anxiety. Often distressed fathers remain unidentified and unsupported due to both a reluctance to seek help for themselves and low levels of community understanding that the transition to parenthood is a difficult period for fathers, as well as mothers.

The project is hoping to both increase understanding of stress and distress in new fathers and encourage new fathers to take action to manage their mental health.

This work is being informed by research commissioned by beyondblue into men’s experiences of psychological distress in the perinatal period.

Informed by the findings of the Healthy Dads research, three projects are underway to provide men with the knowledge, tools and support to stay resilient during the transition to fatherhood.

https://www.medicalert.org.au/news-and-resources/becoming-a-healthy-dad

The post Mental Health Resources for New Dads first appeared on BlueHackers.org.

,

BlueHackersCreative kid’s piano + vocal composition

An inspirational song from an Australian youngster.  He explains his background at the start.

The post Creative kid’s piano + vocal composition first appeared on BlueHackers.org.

BlueHackersThe Attention Economy

In May 2017, James Williams, a former Google employee and doctoral candidate researching design ethics at Oxford University, won the inaugural Nine Dots Prize.

James argues that digital technologies privilege our impulses over our intentions, and are gradually diminishing our ability to engage with the issues we most care about.

Possibly a neat followup on our earlier post on “busy-ness“.

The post The Attention Economy first appeared on BlueHackers.org.

,

Rusty RussellBroadband Speeds, 2 Years Later

Two years ago, considering the blocksize debate, I made two attempts to measure average bandwidth growth, first using Akamai serving numbers (which gave an answer of 17% per year), and then using fixed-line broadband data from OFCOM UK, which gave an answer of 30% per annum.

We have two years more of data since then, so let’s take another look.

OFCOM (UK) Fixed Broadband Data

First, the OFCOM data:

  • Average download speed in November 2008 was 3.6Mbit
  • Average download speed in November 2014 was 22.8Mbit
  • Average download speed in November 2016 was 36.2Mbit
  • Average upload speed in November 2008 to April 2009 was 0.43Mbit/s
  • Average upload speed in November 2014 was 2.9Mbit
  • Average upload speed in November 2016 was 4.3Mbit

So in the last two years, we’ve seen 26% increase in download speed, and 22% increase in upload, bringing us down from 36/37% to 33% over the 8 years. The divergence of download and upload improvements is concerning (I previously assumed they were the same, but we have to design for the lesser of the two for a peer-to-peer system).

The idea that upload speed may be topping out is reflected in the Nov-2016 report, which notes only an 8% upload increase in services advertised as “30Mbit” or above.

Akamai’s State Of The Internet Reports

Now let’s look at Akamai’s Q1 2016 report and Q1-2017 report.

  • Annual global average speed in Q1 2015 – Q1 2016: 23%
  • Annual global average speed in Q1 2016 – Q1 2017: 15%

This gives an estimate of 19% per annum in the last two years. Reassuringly, the US and UK (both fairly high-bandwidth countries, considered in my previous post to be a good estimate for the future of other countries) have increased by 26% and 19% in the last two years, indicating there’s no immediate ceiling to bandwidth.

You can play with the numbers for different geographies on the Akamai site.

Conclusion: 19% Is A Conservative Estimate

17% growth now seems a little pessimistic: in the last 9 years the US Akamai numbers suggest the US has increased by 19% per annum, the UK by almost 21%.  The gloss seems to be coming off the UK fixed-broadband numbers, but they’re still 22% upload increase for the last two years.  Even Australia and the Philippines have managed almost 21%.

,

Rusty RussellQuick Stats on zstandard (zstd) Performance

Was looking at using zstd for backup, and wanted to see the effect of different compression levels. I backed up my (built) bitcoin source, which is a decent representation of my home directory, but only weighs in 2.3GB. zstd -1 compressed it 71.3%, zstd -22 compressed it 78.6%, and here’s a graph showing runtime (on my laptop) and the resulting size:

zstandard compression (bitcoin source code, object files and binaries) times and sizes

For this corpus, sweet spots are 3 (the default), 6 (2.5x slower, 7% smaller), 14 (10x slower, 13% smaller) and 20 (46x slower, 22% smaller). Spreadsheet with results here.

,

Julien GoodwinMaking a USB powered soldering iron that doesn't suck

Today's evil project was inspired by a suggestion after my talk on USB-C & USB-PD at this years's linux.conf.au Open Hardware miniconf.

Using a knock-off Hakko driver and handpiece I've created what may be the first USB powered soldering iron that doesn't suck (ok, it's not a great iron, but at least it has sufficient power to be usable).

Building this was actually trivial, I just wired the 20v output of one of my USB-C ThinkPad boards to a generic Hakko driver board, the loss of power from using 20v not 24v is noticeable, but for small work this would be fine (I solder in either the work lab or my home lab, where both have very nice soldering stations so I don't actually expect to ever use this).

If you were to turn this into a real product you could in fact do much better, by doing both power negotiation and temperature control in a single micro, the driver could instead be switched to a boost converter instead of just a FET, and by controlling the output voltage control the power draw, and simply disable the regulator to turn off the heater. By chance, the heater resistance of the Hakko 907 clone handpieces is such that combined with USB-PD power rules you'd always be boost converting, never needing to reduce voltage.

With such a driver you could run this from anything starting with a 5v USB-C phone charger or battery (15W for the nicer ones), 9v at up to 3A off some laptops (for ~25W), or all the way to 20V@5A for those who need an extremely high-power iron. 60W, which happens to be the standard power level of many good irons (such as the Hakko FX-888D) is also at 20v@3A a common limit for chargers (and also many cables, only fixed cables, or those specially marked with an ID chip can go all the way to 5A). As higher power USB-C batteries start becoming available for laptops this becomes a real option for on-the-go use.

Here's a photo of it running from a Chromebook Pixel charger:

,

Craig SandersNew D&D Cantrip

Name: Alternative Fact
Level: 0
School: EN
Time: 1 action
Range: global, contagious
Components: V, S, M (one racial, cultural or religious minority to blame)
Duration: Permanent (irrevocable)
Classes: Cleric, (Grand) Wizard, Con-man Politician

The caster can tell any lie, no matter how absurd or outrageous (in fact, the more outrageous the better), and anyone hearing it (or hearing about it later) with an INT of 10 or less will believe it instantly, with no saving throw. They will defend their new belief to the death – theirs or yours.

This belief can not be disbelieved, nor can it be defeated by any form of education, logic, evidence, or reason. It is completely incurable. Dispel Magic does not work against it, and Remove Curse is also ineffectual.

New D&D Cantrip is a post from: Errata

Chris NeugebauerTwo Weeks’ Notice

Last week, a rather heavy document envelope showed up in the mail.

Inside I found a heavy buff-coloured envelope, along with my passport — now containing a sticker featuring an impressive collection of words, numbers, and imagery of landmarks from the United States of America. I’m reliably informed that sticker is the valid US visa that I’ve spent the last few months applying for.

Having that visa issued has unblocked a fairly important step in my path to moving in with Josh (as well as eventually getting married, but that’s another story). I’m very very excited about making the move, though very sad to be leaving the city I’ve grown up in and come to love, for the time being.

Unrelatedly, I happened to have a trip planned to Montréal to attend ConFoo in March. Since I’ll already be in the area, I’m using that trip as my opportunity to move.

My last day in Hobart will be Thursday 2 March. Following that, I’ll be spending a day in Abu Dhabi (yes, there is a good reason for this), followed by a week in Montréal for ConFoo.

After that, I’ll be moving in with Josh in Petaluma, California on Saturday 11 March.

But until then, I definitely want to enjoy what remaining time I have in Hobart, and catch up with many many people.

Over the next two weeks I’ll be:

  • Attending, and presenting a talk at WD42 — my talk will be one of my pieces for ConFoo, and is entirely new material. Get along!
  • Having a farewell do, *probably* on Tuesday 28 February (but that’s not confirmed yet). I’ll post details about where and when that’ll be in the near future (once I’ve made plans)
  • Madly packing and making sure that that I use up as close to 100% of my luggage allowance as possible

If you want to find some time to catch up over the next couple of weeks, before I disappear for quite some time, do let me know.

,

Peter LieverdinkAstrophotography with Mac OS X

It's been a good three years now since I swapped my HP laptop for a Macbook Pro. In the mean time, I've started doing a bit more astrophotography and of course the change of operating system has affected the tools I use to obtain and process photos.

Amateur astronomers have traditionally mostly used Windows, so there are a lot of Windows tools, both freeware and payware, to help. I used to run the freeware ones in Wine on Ubuntu with varying levels of success.

When I first got the Mac, I had a lot of trouble getting Wine to run reliably and eventually ended up doing my alignment and processing manually in The Gimp. However, that's time consuming and rather fiddly and limited to stacking static exposures.

However, I've recently started finding quite a bit of Mac OS based astrophotography software. I don't know if that means it's all fairly new or whether my Google skills failed me over the past years :-)

Software

I thought I'd document what I use, in the hope that I can save others who want to use their Macs some searching.

Some are Windows software, but run OK on Mac OS X. You can turn them into normal double click applications using a utility called WineSkin Winery.

Obtaining data from video camera:

Format-converting video data:

Processing video data:

  • AutoStakkert! (Windows + Wine, free for non-commercial use, donationware)

Obtaining data from DSLR:

Processing and stacking DSLR files and post-processing video stacks:

Post-processing:

Telescope guiding:

  • AstroGuider (Mac OS X, payware, free trial)
  • PHD2 (Mac OS X, free, open source)

Hardware

M42 - Orion NebulaA few weeks ago I bought a ZWO ASI120MC-S astro camera, as that was on sale and listed by Nebulosity as supported by OSX. Until then I'd messed around with a hacked up Logitech webcam, which seemed to only be supported by the Photo Booth app.

I've not done any guiding yet (I need a way to mount the guide scope on the main scope - d'oh) but the camera works well with Nebulosity 4 and oaCapture. I'm looking forward to being able to grab Jupiter with it in a month or so and Saturn and Mars later this year.

The image to the right is a stack of 24x5 second unguided exposures of the trapezium in M42. Not too bad for a quick test on a half-moon night.

Settings

I've been fiddling with Nebulosity  abit, to try and get it to stack the RAW images from my Nikon D750 as colour. I found a conversion matrix that was supposed to be decent, but as it turns out that made all images far too blue.

The current matrix I use is listed below. If you find a better one, please let me know.

  R G B
R 0.50 0.00 1.00
G 0.00 1.00 0.00
B 1.00 0.00 0.50

,

Clinton RoySouth Coast Track Report

Please note this is a work in progress

I had previously stated my intention to walk the South Coast Track. I have now completed this walk and now want a space where I can collect all my thoughts.

Photos: Google Photos album

The sections I’m referring to here come straight from the guide book. Due to the walking weather and tides all being in our favour, we managed to do the walk in six days. We flew in late on the first day and did not finish section one of the walk, on the second day we finished section one and then completed section two and three. On day three it was just the Ironbound range. On day four it was just section five. Day five we completed section six and the tiny section seven. Day six was section eight and day seven was cockle creak (TODO something’s not adding up here)

The hardest day, not surprisingly, was day three where we tackled the Ironbound range, 900m up, then down. The surprising bit was how easy the ascent was and how god damn hard the descent was. The guide book says there are three rest camps on the descent, with one just below the peak, a perfect spot for lunch. Either this camp is hidden (e.g. you have to look behind you) or it’s overgrown, as we all missed it. This meant we ended up skipping lunch and were slipping down the wed, muddy awful descent side for hours. When we came across the mid rest camp stop, because we’d been walking for so long, everyone assumed we were at the lower camp stop and that we were therefore only an hour or so away from camp. Another three hours later or so we actually came across the lower camp site, and the by that time all sense of proportion was lost and I was starting to get worried that somehow we’d gotten lost and were not on the right trail and that we’d run out of light. In the end I got into camp about an hour before sundown (approx eight) and B&R got in about half an hour before sundown. I was utterly exhausted, got some water, pitched the tent, collapsed in it and fell asleep. Woke up close to midnight, realised I hadn’t had any lunch or dinner, still wasn’t actually feeling hungry. I forced myself to eat a hot meal, then collapsed in bed again.

TODO: very easy to follow trail.
TODO: just about everything worked.
TODO: spork
TODO: solar panel
TODO: not eating properly
TODO: needing more warmth

I could not have asked for better walking companions, Richard and Bec.

,

Julien GoodwinCharging my ThinkPad X1 Gen4 using USB-C

As with many massive time-sucking rabbit holes in my life, this one starts with one of my silly ideas getting egged on by some of my colleagues in London (who know full well who they are), but for a nice change, this is something I can talk about.

I have a rather excessive number of laptops, at the moment my three main ones are a rather ancient Lenovo T430 (personal), a Lenovo X1 Gen4, and a Chromebook Pixel 2 (both work).

At the start of last year I had a T430s in place of the X1, and was planning on replacing both it and my personal ThinkPad mid-year. However both of those older laptops used Lenovo's long-held (back to the IBM days) barrel charger, which lead to me having a heap of them in various locations at home and work, but all the newer machines switched to their newer rectangular "slim" style power connector and while adapters exist, I decided to go in a different direction.

One of the less-touted features of USB-C is USB-PD[1], which allows devices to be fed up to 100W of power, and can do so while using the port for data (or the other great feature of USB-C, alternate modes, such as DisplayPort, great for docks), which is starting to be used as a way to charge laptops, such as the Chromebook Pixel 2, various models of the Apple MacBook line, and more.

Instead of buying a heap of slim-style Lenovo chargers, or a load of adapters (which would inevitably disappear over time) I decided to bridge towards the future by making an adapter to allow me to charge slim-type ThinkPads (at least the smaller ones, not the portable workstations which demand 120W or more).

After doing some research on what USB-PD platforms were available at the time I settled on the TI TPS65986 chip, which, with only an external flash chip, would do all that I needed.

Devkits were ordered to experiment with, and prove the concept, which they did very quickly, so I started on building the circuit, since just reusing the devkit boards would lead to an adapter larger than would be sensible. As the TI chip is a many-pin BGA, and breaking it out on 2-layers would probably be too hard for my meager PCB design skills, I needed a 4-layer board, so I decided to use KiCad for the project.

It took me about a week of evenings to get the schematic fully sorted, with much of the time spent reading the chip datasheet, or digging through the devkit schematic to see what they did there for some cases that weren't clear, then almost a month for the actual PCB layout, with much of the time being sucked up learning a tool that was brand new to me, and also fairly obtuse.

By mid-June I had a PCB which should (but, spoiler, wouldn't) work, however as mentioned the TI chip is a 96-ball 3x3mm BGA, something I had no hope of manually placing for reflow, and of course, no hope of hand soldering, so I would need to get these manufactured commercially. Luckily there are several options for small scale assembly at very reasonable prices, and I decided to try a new company still (at the time of ordering) in closed trials, PCB.NG, they have a nice simple procedure to upload board files, and a slightly custom pick & place file that includes references to the exact component I want by Digikey[link] part number. Best of all the pricing was completely reasonable, with a first test run of six boards only costing my US$30 each.

Late in June I recieved a mail from PCB.NG telling me that they'd built my boards, but that I had made a mistake with the footprint I'd used for the USB-C connector and they were posting my boards along with the connectors. As I'd had them ship the order to California (at the time they didn't seem to offer international shipping) it took a while for them to arrive in Sydney, courtesy a coworker.

I tried to modify a connector by removing all through hole board locks, keeping just the surface mount pins, however I was unsuccessful, and that's where the project stalled until mid-October when I was in California myself, and was able to get help from a coworker who can perform miracles of surface mount soldering (while they were working on my board they were also dead-bug mounting a BGA). Sadly while I now had a board I could test it simply dropped off my priority list for months.

At the start of January another of my colleagues (a US-based teammate of the London rabble-rousers) asked for a status update, which prompted me to get off my butt and perform the testing. The next day I added some reinforcement to the connector which was only really held on by the surface mount pins, and was highly likely to rip off the board, so I covered it in epoxy. Then I knocked up some USB A plug/socket to bare wires test adapters using some stuff from the junk bin we have at the office maker space for just this sort of occasion (the socket was actually a front panel USB port from an old IBM x-series server). With some trepidation I plugged the board into my newly built & tested adapter, and powered the board from a lab supply set to limit current in case I'd any shorts in the board. It all came up straight away, and even lit the LEDs I'd added for some user feedback.

Next was to load a firmware for the chip. I'd previously used TI's tool to create a firmware image, and after some messing around with the SPI flash programmer I'd purchased managed to get the board programmed. However the behaviour of the board didn't change with (what I thought was) real firmware, I used an oscilloscope to verify the flash was being read, and a twinkie to sniff the PD negotiation, which confirmed that no request for 20v was being sent. This was where I finished that day.

Over the weekend that followed I dug into what I'd seen and determined that either I'd killed the SPI MISO port (the programmer I used was 5v, not 3.3v), or I just had bad firmware and the chip had some good defaults. I created a new firmware image from scratch, and loaded that.

Sure enough it worked first try. Once I confirmed 20v was coming from the output ports I attached it to my recently acquired HP 6051A DC load where it happily sank 45W for a while, then I attached the cable part of a Lenovo barrel to slim adapter and plugged it into my X1 where it started charging right away.

At linux.conf.au last week I gave (part of) a hardware miniconf talk about USB-C & USB-PD, which open source hardware folk might be interested in. Over the last few days while visiting my dad down in Gippsland I made the edits to fix the footprint and sent a new rev to the manufacturer for some new experiments.

Of course at CES Lenovo announced that this years ThinkPads would feature USB-C ports and allow charging through them, and due to laziness I never got around to replacing my T430, so I'm planning to order a T470 as soon as they're available, making my adapter obsolete.





Rough timeline:
  • April 21st 2016, decide to start working on the project
  • April 28th, devkits arrive
  • May 8th, schematic largely complete, work starts on PCB layout
  • June 14th, order sent to CM
  • ~July 6th, CM ships order to me (to California, then hand carried to me by a coworker)
  • early August, boards arrive from California
  • Somewhere here I try, and fail, to reflow a modified connector onto a board
  • October 13th, California cowoker helps to (successfully) reflow a USB-C connector onto a board for testing
  • January 6th 2017, finally got around to reinforce the connector with epoxy and started testing, try loading firmware but no dice
  • January 10th, redo firmware, it works, test on DC load, then modify a ThinkPad-slim adapater and test on a real ThinkPad
  • January 25/26th, fixed USB-C connector footprint, made one more minor tweak, sent order for rev2 to CM, then some back & forth over some tolerance issues they're now stricter on.


1: There's a previous variant of USB-PD that works on the older A/B connector, but, as far as I'm aware, was never implemented in any notable products.

,

Ben LeslieInteresting corners of SFTP

January 11, 2017 is an auspicious day in tech history. No, I'm not talking about the 10 year anniversary of the iPhone announcement, I'm talking about with much more impact: today marks a decade since the most recent version (13) of the SFTP internet draft expired. So, I thought it would be a good opportunity to rant a bit about some of the more interesting corners of the protocol.

The first interesting thing is that even though the there are 14 versions of the internet draft (the first being version zero) the protocol version used by most (all?) major implementations (OpenSSH in particular) is version 3 (which of course is specified in specification version 2, because while the specification number is zero-based the protocol version numbering is 1-based). Specification version 2 is expired on April's fool day 2002.

The second interesting thing is that this never reached the status of RFC. I can't claim to undersatnd why, but for a commonly used tool it is relatively surprising that the best we have in terms of written specification is an expired internet draft. Certainly a step up from now documentation at all, but still surprising.

The third interesting thing is that it isn't really a file transfer protocol. It certainly lets you do that, but its really much closer to a remote file system protocol. The protocol doesn't expose an upload / download style interface, but rather an open / close / read / write style interface. In fact most of the SFTP interfaces are really small wrappers around the POSIX APIs (with some subtle changes just to keep everyone on their toes!). One consequence of this style API is that there isn't any kind of atomic upload operation; there is no way for the srver to really know when a client has finished uploading a file. (Which is unfortunate is you want to say trigger some processing after a file is uploaded).

The fourth interesting thing is that some parts of the protocol are underspecified, which makes solid implementation a little tricky. Specifically there is no definition of a maximum packet size, but in practise popular STP client implement a maximum packet size. So, if you are a server implementator all you can do is guess! If you really want to gory details check out this bug report.

The fifth interesting thing is the way in which the readdir implementation work. Now, on a UNIX you have the POSIX readdir API, however that returns only a single file at a time. You don't want a network round trip for each file in a directory. So the SFTP readdir is able to return multiple results in a single API (although exactly how many is underspecified; see the previous interesting thing!).

Now, this implementation lets you type ls into the SFTP client and you only need a few round-trips to display things. However, people aren't happy with just a file listing, and generally use ls -l to get a long listing with additional details on each file. On a POSIX system you end up calling stat on each filename to get the details, but if you have this design over the network then you are going to end up back with a lot of round-trips. So, to help optimize for this, rather than readdir returning a list of filenames, it returns a list of (filename, attribute) tuples. This let you do the ls -l without excessive round-trips.

Now, so far that isn't interesting. The really interesting thing here is that they didn't stop here with this optimisation. To implement ls -l you need to take the filename and attributes and then format them into a string that looks something like: drwxr-xr-x 5 markus markus 1024 Jan 13 18:39 .ssh. Now, there is not really anything that stops an SFTP client doing the transformation on the client. Doing so certainly doesn't requier any additional network round trips. But, for some reason, the SFTP server actually sends this formatted string in the result of readdir! So, readdir returns a list of (filename, longname, attribute) tuples. This is kind of strange.

The only real reason for doing this transformation on the server side is that the client can't perform a transformation from gid / uid into strings (since you can't expect the client and server to share the same mapping). So, there is some justificaiton here. However, readdir isn't the only call you need to implement ls. If the user simply performs an ls on a specific file, then you need to use stat to determine if the file exists, and if so get the attribute information. The SFTP protocol provides a stat call that basically works as you'd expect, returning the attributes for the file. But, it does not return the longname! So, now you know why sometimes an ls in the SFTP client will provide you with user and group names ,while other times it just show numeric identifiers.

The sixth interesting thing is that despite there being a cd command in the OpenSSH SFTP client, there isn't any chdir equivalent in the protocol. So, cd is a pure client-side illusion. A related interesting thing is that the protocol support filenames starting with a leading slash and without slash. With a slash means absolute (as you'd expect), but without a slash doesn't mean relative, or at least not relative to any current working directory. Paths without a leading slash are resolved relative to a default directory. Normally this would be the user's home directory, but really it is up to the SSH server.

The final interesting thing worth commenting on is that even though most of the SFTP interfaces seem to be copies of POSIX interfaces for rename, they chose to specify different semantics to the POSIX rename! Specifically, in SFTP it is an error to rename a file to an existing file, while POSIX rename supports this without an issue. Now of course, people wanted their normal POSIX semantics in rename, so OpenSSH obliged with an extension: posix-rename. So, OpenSSH has two renames, one with SFTP semantics, and one with POSIX semantics. This is not unreasonable, the really fun part is that the OpenSSH SFTP client automatically tries ot use the posix-rename semantics, but will happily, silently fall back to the SFTP semantics.

Hopefully if you ever need to deal with SFTP at a low level this has given you some pointers of things to look out for!

,

Sam WatkinsLinux, low power / low heat for summer

Sometimes I play browser games including diep.io.  This loads the CPU and GPU, and in this summer weather my laptop gets too hot and heats up the room.

I tried using Chrome with the GPU disabled, but the browser games would still cause the GPU to ramp up to full clock rate. I guess the X server was using the GPU.

google-chrome --disable-gpu   # does not always prevent GPU clocking up

So here’s what I did:

For the NVIDIA GPU, we can force the lowest power mode by adding the following to the “Device” section in /etc/X11/xorg.conf:

# Option "RegistryDwords" "PowerMizerEnable=0x0;"
Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x3333; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3"

Unfortunately the “nvidia-settings” tool does not allow this.  It is necessary to restart the X server in order to change this setting.  Just swap which line is commented out.

Given that we are keeping the GPU cool like this, Chrome works better with the GPU enabled not disabled.

For the CPU, setting “scaling_governor=powersave” does not force the lowest power mode, and the CPU still clocks up and gets hot.  But we can set “scaling_max_freq” to stop Linux from raising the clock speed.  I’m using this shell script “cpu_speed“:

#!/bin/bash
cmd=${1-info}
cd /sys/devices/system/cpu
for cpu in cpu[0-9]*; do
 (
 cd $cpu/cpufreq
 case "$cmd" in
 info) echo $cpu `<scaling_cur_freq` `<scaling_min_freq` `<scaling_max_freq`
 ;;
 slow) cat cpuinfo_min_freq >scaling_min_freq
 cat cpuinfo_min_freq >scaling_max_freq
 ;;
 fast) cat cpuinfo_min_freq >scaling_min_freq
 cat cpuinfo_max_freq >scaling_max_freq
 ;;
 esac
 )
done

I can run it with “cpu_speed” to see the current speed, “cpu_speed slow” to fix the clock at the lowest speed, and “cpu_speed fast” to allow the clock to go up to the maximum speed.

This “temperature” script shows the NVIDIA GPUCurrentPerfLevel, GPUCoreTemp, and CPU temperature info:

#!/bin/sh
(
set -a
: ${DISPLAY:=:0.0}
nvidia-settings -q GPUCurrentPerfLevel -q GPUCoreTemp
acpi -t
) 2>/dev/null |
perl -ne 'print "$1 " if /[:,] (\d+)\./'
echo

Finally, I can reduce the screen resolution to decrease the load on the GPU and CPU.  “xrandr” with the NVIDIA driver does not allow me to change the resolution directly, but there is an option to scale the display.  This gives much smoother performance in the browser games, and the lower resolution doesn’t hurt.

xscale-half:

xrandr --output DP-2 --scale 0.5x0.5

xscale-normal:

xrandr --output DP-2 --scale 1x1

Anyway, now I have my laptop set up to run cool by default.  This doesn’t hurt for most things I am doing with it, and I feel it’s less likely to explode and burn down our house.

,

Peter LieverdinkSouthern Exposure

From time to time you see photos pop up on the internet that show off bits of the northern sky. A good example is a montage of the Moon and Andromeda that show what size Andromeda would be in the sky, if only it were actuallty visible to the naked eye.

Bad Astronomy did a blog post on that one and explained that though the image is fake, the relative sizes are pretty much correct.

However, that's not a lot of use to us poor people in the southern hemisphere that can't even see Andromeda at the best of times. What even are these northerners talking about?

During public viewings at Mount Burnett Observatory, people often want to see a galaxy and ask to see Andromeda. However, we always need to disappoint them, as at our latitude of 37.5 degrees south Andromeda barely rises high enough to clear the trees. And even if it does clear the trees, it's so low in the sky that you're looking at it through light pollution and dusty atmosphere.

So I thought I'd make a montage of the Moon and our visible galaxies (the Magallanic Clouds) to show of their relative sizes. Hopefully that will make people eventually ask to see these, as they are easily bright enough to see from a dark spot with the naked eye when the moon isn't up!

I took the Moon from the original photo by Stephen Rahn and pasted it onto (approximately) the south celestial pole on a long exposure photo I took of the southern sky over the 2016/2017 New Years weekend.

Montage of the moon in the southern sky.

The visible part Large Magellanic Cloud in this photo is about 2.5 degrees on the short axis, so that makes it about 5 times wider than the full moon, which is about half a degree. If anything, I estimated the moon to be a little bit too big in this montage.

There are also fainter parts that I couldn't capture in this photo. On the long axis the full LMC is about 10.5 degrees across - 21 times the width of the Moon!

So, the next time you see the original montage with Andromeda do the rounds and wish you could see a large galaxy, all you need to do is go outside on a dark night and look up!

You can find my original southern sky image on Flickr.

Tags: 

,

Chris NeugebauerMy 2016 Highlights

2016 was, undeniably, a length of time containing 366 days and a leap second.

For me, there were a bunch of highlights that it would be amiss to let pass without recording on this blog, so here goes:

  • At linux.conf.au 2016 in Geelong in February, I announced linux.conf.au 2017 in Hobart. Over the last year, the conference team and I ran a wildly successful CFP, found 4 amazing keynotes, and lined up what looks like it should be an excellent conference. The only* thing left to do is actually run the thing.
  • At PyCon in Montréal in 2014, I ran a BoF session for regional PyCon organisers. Two people from the Dominican Republic showed up and asked for our help in starting a PyCon in the Caribbean. In February 2016, I got to go to that conference, and it was incredible!
  • On that note, I got to continue building on a deeply wonderful relationship with the amazing Josh Simmons that we started in 2015. Over the course of 2016, we got to spend time with each other on no fewer than 6 occasions, both in North America, and here in Australia. We met (and got along quite well with) each others’ friends and families. We spent time living together, and have made big steps towards living together permanently this year. Frankly, I could do a whole post on this and I’m not sure why I haven’t.
  • On a slightly related note, I spent 92,000-odd miles in the air this year. Much of that was spent ducking over to the US to spend time with Josh; some of the rest was with Josh, and some of it was alone. I got to see some wonderful places I’ve never seen before, like the Grand Canyon and Hoover Dam, an actual northern hemisphere winter with snow and everything, and driving up the Californian coast from Los Angeles.
  • … and one night in May, on the Steel Bridge in Portland, Josh and I decided that we should get married.Chris and Josh on the Steel Bridge

So those are some of the highlights of my year. It’s been entirely not bad, in the grand scheme of things. Hooray!

,

Clinton RoyIn Memory of Gary Curtis

This week we learnt of the sad passing of a long term regular attendee of Humbug, Gary Curtis. Gary was often early, and nearly always the last to leave.

One  of Gary’s prized possessions was his car, more specifically his LINUX number plate. Gary was very happy to be our official airport-conference shuttle for linux.conf.au keynote speakers in 2011 with this number plate.

Gary always had very strong opinions about how Humbug and our Humbug organised conferences should be run, but rarely took to running the events himself. It became a perennial joke at Humbug AGMs that we would always nominate Gary for positions, and he would always decline. Eventually we worked out that Humbug was one of the few times Gary wasn’t in charge of a group, and that was relaxing for him.

A topic that Gary always came back to was genealogy, especially the phone app he was working on.

A peculiar quirk of Humbug meetings is that they run on Saturday nights, and thus we often have meetings at the same time as Australian elections. Gary was always keen to keep up with the election on the night, often with interesting insights.

My most personal memory of Gary was our road trip after OSDC New Zealand, we did something like three days of driving around in a rental car, staying at hotels along the way. Gary’s driving did little to impress me, but he was certainly enjoying himself.

Gary will be missed.