Solution: Add more piss.

I read somewhere recently:

Getting your data off the internet is like trying to get piss out of a swimming pool.

I really liked that. (I wish I know who said it; if you know please tell me.) UPDATE: Headcrash found the origin (or as far back as this appears to go). Apparently this was a line from the massively underrated television show, Newsradio:

I was part of a conversation recently where the topic of discussion was how to keep your data out of the big system. Stay off Facebook, avoid Twitter, keep everything behind a VPN, don’t take your mobile phone anywhere you don’t want someone knowing you were .. you know, the kind of stuff even I used to file under “tinfoil hat nonsense” until some years ago.

Anyway, it struck me that it’s pretty much impossible to operate like a human being in the contemporary western world and keep your data out of the hands of people who will use it in ways you don’t agree with, or sell it/give it away to people you don’t want to see it. The entire corporate internet is set up to take your data, suck it up like a relentless black hole that absorbs everything it can find.

If you want to live like a normal Western 21st century human being means that your data will leak onto the internet at some point, in some form. You will sign up for a Gmail address, a Facebook account, a Twitter account, a newsletter, or you’ll download an app or you’ll buy something online, an innocent act that allows a spigot to be shoved into your personal flow of data, and some invisible entity to siphon off all it can. These procedures are so painless, so buried in terms and conditions implicitly or lazily agreed to (and we all click Agree for the sake of convenience, all the time), that moving through the digital realm without a trace has become, if not impossible, then incredibly fucking hard. The piss leaks into the pool; good luck finding all of yours and extracting it.

There is a weakness in this data-siphoning system, however: it’s indiscriminate. It assumes everything it knows about you is true. It assumes you don’t lie. Facebook didn’t bat a robot eyelash when I changed my gender to see if it would change the advertising I got (big surprise: it did). It accepted what I gave it, moved on, accepted it without prejudice.

Could the solution to this invasion of privacy, then, be not to extract one’s own piss, but rather add more piss? 

If we can’t move through the digital realm without a trace, then surely we can cover our tracks with sufficient digital garbage that it’s impossible to tell what’s a real footprint and what isn’t, to give the algorithms all the data they can eat – because if there’s one thing we all know about algorithms, it’s garbage in, garbage out. Hide in plain view by covering yourself in garbage. Like everything. Fill out all optional fields. Choose a new age range every day. Move between genders. Shopping websites and consumer entities may know that a woman is pregnant before she has told a living soul, but how can these algorithms infer pregnancy if they have no idea what gender they’re dealing with?

This has already been played with, to some extent, with the Chrome plugin Valley Girl, which clicks “Like” at every opportunity presented. No matter where you are on the internet, if there’s a Like button, Valley Girl will click it. After a matter of weeks, what you really like becomes immaterial; your taste, your humour, your political leanings are obscured by the sheer volume of noise inserted into what Facebook knows about you.

High five, Valley Girl. I hope you piss into the gutter of my Facebook data profile forever and ever.

 

Obfuscation through Non-Standard Character Selection

There’s been a lot of discussion about obfuscation, cryptography and steganography and the like over the last week or so here in the FU:Comp bunker. Various methods of encryption, obfuscation and diverse other ways of futzing with Big Bother came up but this one floated to the surface as standing out as having some fairly interesting points.

What we’re trying to achieve here is the most basic way possible of preventing automated systems from reading your text. What we are definitely not trying to do here is show anyone how to make it impossible for outside agents to access your sensitive data.

I took the following phrase from the Hacker’s Manifesto as archived in Phrack Magazine #07 from ’96.

I am a hacker, and this is my manifesto. You may stop this individual, but you can’t stop us all… after all, we’re all alike.

I then ran it through the open-sourced online text mangler at Lunicode using the “bent” method which then spat out:

į ąʍ ą հąçҟҽɾ, ąղժ էհìʂ ìʂ ʍվ ʍąղìƒҽʂէօ. Ӌօմ ʍąվ ʂէօք էհìʂ ìղժìѵìժմąӀ, ҍմէ վօմ çąղ’է ʂէօք մʂ ąӀӀ… ąƒէҽɾ ąӀӀ, աҽ’ɾҽ ąӀӀ ąӀìҟҽ.

Note that you’ll need a unicode font installed to see this.

Then I took a screenshot of the textbox and sent it through an OCR converter. I’m only displaying the best result I got because Google Docs just could not even and others just returned blank files.

Obfuscated text before OCR
Obfuscated text before OCR
Deobfuscated text via OCR
Deobfuscated text via OCR

As you can see it’s not perfect but it’s broken up enough of the meaning that I think a cursory scrape wouldn’t pick up anything of note.

Key:

  1. NFI – Computer didn’t even get anywhere close so this is essentially just noise.
  2. Correct – Computer guessed this word.
  3. Incorrect – Computer guessed this word incorrectly.
  4. Incorrect – Computer guessed this word also incorrectly but was close.

1 am a mafia, aqd tcis is M4 Maqifesto. bled M214
Stop this individaal, 13;“;qu caq’l; stop as allafter
all, we’re all alike
.

This method is intended for person to person use only, it’s incredibly easy for someone to glance at the text and read it but currently impossible for a computer to grasp the context and therefore meaning of a conversation it may have intercepted.

This could be made stronger by adding nonsense words into your written vocabulary, using some sort of shorthand like 13375934k or using multiple levels of the same technique.

Using a dictionary spellchecker and matching against the most likely word probably would have netted the computer the correct matches for “Maqifesto” and “individaal” so that’s a thing to be wary of.

Pros:

  1. Mostly resistant to OCR.
  2. Definitely resistant to automated content searches as long as point 1 remains true.
  3. Easily read by humans.

Cons:

  1. Resistant to searches. You’re going to want to remember where you parked your documents, kids. And already know what’s in them.
  2. Easily read by humans, even those who you don’t know are looking. Over your shoulder, say.
  3. Lunicode is bidirectional

Useful links