against recognition

hello! I'm ~~on the train to Miami~~ in Miami!

—

Figure 1: https://twitter.com/robinsloan/status/1401918583861506048?ref_src=twsrc%5Etfw

at Dynamicland, I had this idea for an 'audio recorder' or 'audio note' device – it would be a thing that has a push button, a microphone, and a receipt printer:

you hold the button down to record¹
say something / make whatever sound you want
let go of the button
the device immediately (or even in real time, while you're speaking?) prints out a little receipt that 'contains' (that is) the audio that it just recorded²
then you can put that sound (receipt) down to play it

like I feel like there is no audio equivalent of jotting something down on a post-it note (or even writing on a whiteboard), and that's kinda what I wanted: a cheap physical object, made of sound instead of writing, where both consuming it and creating it feel immediately at hand.

It'd be a mix of the nice things about audio (oral communication style instead of written, can bring in other noise-making objects, background sounds can make it in without your explicitly intending to include them, music, kids/nonliterate people can participate, etc) and the nice things about writing a note (cheap to make, disposable, objectness, can point to and reuse later).

(ideas: you narrate a project you're working on, then literally stick your narration onto the project / you hum a bunch of little sound samples and play with them to make some musical composition / you build a game where the game pieces are spoken words… / i don't know. maybe you could also make non-visual systems, where you only use your hands and ears and mouth, that would be accessible even to someone who can't read or can't see)

—

but – and I think this is important – the thing would not try to recognize your speech and turn it into text. the receipt represents only the sound itself. (From the computer's point of view, it's just an opaque blob of audio.)³

The computer is what stores the audio data for the receipt, and when you put down the receipt, the computer is what plays the audio through the speakers in the ceiling. So the computer's role here is to store (audio) information and to orchestrate hardware, but its role is not really to digest and index that information.

I don't know… if you have the computer recognize the speech inside these audio slips and turn it into text, you're now privileging that one kind of audio (speech), which feels weird to me. you're going to push people to make sound in the ways that are legible to the computer. you may start to think that the text is the 'canonical' format, and the audio is only a temporary stop on the way to figuring out the text, and it's a Problem if that text comes out wrong, and you'll treat everything in the world only in terms of how you can convert it to text…

(and to what end? none of the ideas or aspirations I've described so far require the computer to understand what's in the audio receipts.)

(sure, maybe it'd be nice to have, like, search, but should that nice-to-have dominate the design? You can't search physical books or post-it notes, and that's fine. I want to play with this fresh thing of audio notes, which doesn't even exist yet, before trying to add an extra layer of recognition and textuality and legibility, a layer which may overwhelm some of the things about audio that were interesting to me in the first place.)

(I think there are good ways to use recognition, but it shouldn't be the first thing to implement, and it probably shouldn't be implemented in the obvious way.)

A couple of years ago, we wanted to have a better idea of what projects people were doing (and had done already) at Dynamicland, so we started making this 'research gallery' application.⁴

It was built around this dynamic 'scrapbook'. The idea was that when you make something, you'd add a new page to the scrapbook about it, with photos, videos, text description, maybe a little embedded demo of the thing, and so on.⁵

You can see two such pages below – on the left, a scrapbook page about the "Animation" project, and on the right, a scrapbook page about the "DNA Kit" project:

Figure 2: /Users/osnr/Code/newsletters/2021-06-09/scrapbook.jpg

Each project at Dynamicland would be represented by a big page (or two or three) in the scrapbook.

Note that you can really put anything you want on the scrapbook page (it is, after all, a physical page). Look at everything that isn't framed by colored dots:

@@html:<a href="/Users/osnr/Code/newsletters/2021-06-09/scrapbook-unrecognized.jpg"><img width="250" src="/Users/osnr/Code/newsletters/2021-06-09/scrapbook-unrecognized.jpg"></a>

These are things – post-it notes, bits of text floating around that were written by different people, handwritten headings – where the computer doesn't even know they're there. But they are still useful for the human! And, unlike 'text', they can vary in human ways; they can be written or typeset differently, set at different sizes, with different colors, and so on – without the software needing to implement any of those features. It's like how I can yell or cry or laugh in an audio recording, and that won't come across if it gets 'recognized' into text.

To be honest, I think we could have gone further here. I would have liked to not have any dots on the scrapbook page at all, just photos and writing and stuff.⁶ (and if that means the computer has to give up some legibility, that would have been OK by me.)

take seriously the idea that a lot of things in the computer are not really about the computer, but are about people and the relationships between people, and they are consumed by people. that means it should be fine to put something inside the computer that the computer cannot digest, but that it can pass on to other people.

it's like a code comment.

it's like a note on something in the software https://twitter.com/rsnous/status/1361081403912249345

I have this reMarkable tablet. I really like it. I use it a lot. Its UI looks like this (source):

Figure 3: https://support.remarkable.com/hc/en-us/articles/360002671958-Navigating-on-your-reMarkable

and that frustrates me a little. I feel like you could do so much more; I feel like that interface doesn't really take the tablet and pen seriously.⁷

There's so much typed text on that screen – there are so many straight lines and buttons – it feels like it's a UI built around tapping and clicking and maybe keyboard shortcuts, not around the pen. Shouldn't as much of the interface as possible be handwritten text, and hand-drawn lines, and weird-shaped regions carved with a pen?

A small example: titles. like, if you have a tablet, with a pen, you should be able to just… draw a title. even if the computer doesn't know what it means, the point of the title is for you, so you can identify your files on sight!

TWEET WITH THE OVERLAPPING TEACHER CORNERS

why is there any typed text on the tablet UI? why am I typing into a software keyboard? could it all be handwritten? then you could draw little smiley faces or stars or whatever you want.

it's weird that tablets like the rM or iPad don't use this more in their interfaces. like Dynamicland, they have a new form of input, a form that goes way beyond traditional mouse and keyboard or capacitive touch, that is far more open-ended, and I think they should use it, pervasively

without treating handwriting as just an input to a recognition system:

Figure 4: /Users/osnr/Code/newsletters/2021-06-09/ipad-scribble.png

(although I understand why that form of text recognition is useful – existing systems are big, and they do a lot of stuff you can't replicate easily, and some form of pen compatibility with them is important – it doesn't excite me, and I feel like it limits our imagination)

It's not that I think that recognition is bad, but I think it is more interesting to err against it when we're designing new systems.⁸

And I think recognition creates a SOCIAL PROBLEM.

Against handwriting recognition / voice recognition / character recognition:

Figure 5: https://twitter.com/rsnous/status/1352720374182502400?ref_src=twsrc%5Etfw

blob floating inside the computer that the computer doesn't know what's inside it, but that's fine. the blob is for people, not for the computer

TODO: look at old DL writeup

We have all these computer systems that love lowest-common-denominator formats like plain text, and they push programmers to normalize everything into those formats, so the computer can 'understand' them.

But I think as much as possible, the computer should be leaving things the way they are!

Figure 6: /Users/osnr/Code/newsletters/2021-06-09/data-as-ashes.png

Recognition as this lowest common denominator format that programmers want to normalize everythingn down into

Screenshots, DL not plain text

Leave things the way they are!

If you have recognition, it should be just one of many overlays you can put on a thing, not a transformation. The image should have an overlay attached saying that it might have this text in it; the image shouldn't be transformed into text. (it should admit its provenance, admit other things it considered, admit that it could be wrong ie probability)

Destructive

Social problem. Moving into a more comfortable format for programmers. Modularization of system into recognition component and ui component, controlled by different people. The temptation is to just keep on working on the recognition. But if you had an end to end perspective…

more conservative example

it's ironic, because

Screenotate leaves the original screenshot intact. the screenshot is the source of truth. (link)

Figure 7: /Users/osnr/Code/newsletters/2021-06-09/original-screenshot.png

i feel like the computer should give you more space to play. like you should be able to play and doodle by default, wherever you happen to be on the computer, without your first concern being whether the computer will recognize it…

Figure 8: https://twitter.com/rsnous/status/1403053716924731397?ref_src=twsrc%5Etfw

it's not the computer's job to recognize things, it's the computer's job to hold onto what I put in it

this is sort of the idea behind literate programming, too. the idea that the stuff for humans should be the default context, and the highly constrained stuff parsed by the computer should be the exception.

Figure 9: https://twitter.com/rsnous/status/1402635482840932358?ref_src=twsrc%5Etfw

like a sheet of paper

tabfs menu stuff

(I still want someone to test the Windows port of TabFS! I won't have a Windows machine until I get home on the 23rd.)

Footnotes:

that is, it's push-to-talk / spring-loaded

It's strange – the detail that the printer is a receipt printer feels very important. The fact that 1. the receipt comes out right at hand, like a Polaroid (rather than out of a printer somewhere else in the room), and 2. the receipt comes out fast (not a laser printer churning for 30 seconds before spitting out a page). it's part of what creates this sense of lightness and 'post-it-ness'

maybe the receipt would have the audio waveform printed on it or something, just to give it a unique appearance. Plus, there's this Dynamicland aspiration that we'd want to maintain – that a computer could in theory look at the receipt and regenerate the audio completely from what it sees (even if in practice it usually 'cheats' by keeping the audio file on disk)

(The Dynamicland aspiration is that you should be able to look at the situation in the real world & completely derive the computer from that; there shouldn't be invisible 'virtual state' that lives in RAM or on a hard disk.)

(Ideally, if the power cut out and the computers in the ceiling at Dynamicland all restarted, nothing would be lost, because the physical arrangement of stuff in the real world completely determines the behavior of the computer anyway, and that physical arrangement remains intact.)

⁴

and we wanted to think about how you would make a 'database' or querying interface that takes advantage of the unique properties of Dynamicland, and we wanted more applications in DL that actually got regularly used in a real context.

⁵

reminds me a bit of this, btw: https://twitter.com/bschne/status/1393821742523731969

⁶

(part of the problem is that the dots take so much space!)

⁷

FRUSTRATION BC WE NEED IT FOR PROGRAMMING, IT'S A WASTE!!!!

⁸

I think a lot of design ideas in Dynamicland can be viewed this way: yeah, there are iPads, and there are VR, and there are mice, and there is memory, and there is technology to track users of the system, but what if you bound yourself and gave all those up? Would you get a more simpler, more coherent system?