Obfuscation through Non-Standard Character Selection

There’s been a lot of discussion about obfuscation, cryptography and steganography and the like over the last week or so here in the FU:Comp bunker. Various methods of encryption, obfuscation and diverse other ways of futzing with Big Bother came up but this one floated to the surface as standing out as having some fairly interesting points.

What we’re trying to achieve here is the most basic way possible of preventing automated systems from reading your text. What we are definitely not trying to do here is show anyone how to make it impossible for outside agents to access your sensitive data.

I took the following phrase from the Hacker’s Manifesto as archived in Phrack Magazine #07 from ’96.

I am a hacker, and this is my manifesto. You may stop this individual, but you can’t stop us all… after all, we’re all alike.

I then ran it through the open-sourced online text mangler at Lunicode using the “bent” method which then spat out:

į ąʍ ą հąçҟҽɾ, ąղժ էհìʂ ìʂ ʍվ ʍąղìƒҽʂէօ. Ӌօմ ʍąվ ʂէօք էհìʂ ìղժìѵìժմąӀ, ҍմէ վօմ çąղ’է ʂէօք մʂ ąӀӀ… ąƒէҽɾ ąӀӀ, աҽ’ɾҽ ąӀӀ ąӀìҟҽ.

Note that you’ll need a unicode font installed to see this.

Then I took a screenshot of the textbox and sent it through an OCR converter. I’m only displaying the best result I got because Google Docs just could not even and others just returned blank files.

Obfuscated text before OCR
Obfuscated text before OCR
Deobfuscated text via OCR
Deobfuscated text via OCR

As you can see it’s not perfect but it’s broken up enough of the meaning that I think a cursory scrape wouldn’t pick up anything of note.

Key:

  1. NFI – Computer didn’t even get anywhere close so this is essentially just noise.
  2. Correct – Computer guessed this word.
  3. Incorrect – Computer guessed this word incorrectly.
  4. Incorrect – Computer guessed this word also incorrectly but was close.

1 am a mafia, aqd tcis is M4 Maqifesto. bled M214
Stop this individaal, 13;“;qu caq’l; stop as allafter
all, we’re all alike
.

This method is intended for person to person use only, it’s incredibly easy for someone to glance at the text and read it but currently impossible for a computer to grasp the context and therefore meaning of a conversation it may have intercepted.

This could be made stronger by adding nonsense words into your written vocabulary, using some sort of shorthand like 13375934k or using multiple levels of the same technique.

Using a dictionary spellchecker and matching against the most likely word probably would have netted the computer the correct matches for “Maqifesto” and “individaal” so that’s a thing to be wary of.

Pros:

  1. Mostly resistant to OCR.
  2. Definitely resistant to automated content searches as long as point 1 remains true.
  3. Easily read by humans.

Cons:

  1. Resistant to searches. You’re going to want to remember where you parked your documents, kids. And already know what’s in them.
  2. Easily read by humans, even those who you don’t know are looking. Over your shoulder, say.
  3. Lunicode is bidirectional

Useful links

Demonology & Data Reduction

Imagine that your job is to describe all the ways things go wrong with the human psyche. And that you’re alive during the 16th century, or even the 11th. You have to describe how people are dangerous to themselves and others for the good of a society that is wracked by injury, disease, and rampant structural unfairness. Your system must be comprehensive, easy to understand and remember, and effective despite the fact that you have essentially no experimental data, very little in the way of diagnostic survey work, and basically you’re running off of shaky collective memory and folklore that is, itself, frequently destroyed or distorted by large scale civil trauma.

There is a system of mental disorder and civil unrest in the world, you have a minimal amount of information about it, and your goal is to represent it in a way that is useful to an ignorant populace.

Welcome to the world of demonology, and Data Reduction 101.


 

The Lanterne of Light is a 15th Century text that establishes, for the first time, the systematic hierarchy of Christian demons organised by sin. It existed as part of a social movement to interpret the Bible into the language of the uneducated (i.e., English), and has been so effective that to this day most of the secular west understand Lucifer as emblematic of Pride. By telling stories about the fall from heaven through this lens we have gained a surprisingly sophisticated view on how an emotional state functions and how it can destroy a life. Almost every westerner knows that there are seven deadly sins in total, even if they don’t comprehend that the movie “Seven” is essentially an interpretation of this anonymously sourced, early 1400s, English Lollard tract. What the Lanterne does is to systematise social threat, and to then represent that system in a way that is so compelling, it remains remembered while 2010 was it’s 600th anniversary.

This is, at its core, the exact same process that drives modern psychology. An understanding of the need and mechanism behind the Lanterne directly helps us to understand modern statistical practice. The only thing that has changed is a matter of degree: The amount and reliability of data available, the minimal level of complexity which retains utility, and the size of the community which can utilise the output. The Diagnostic and Statistical Manual of Mental Disorders (DSM), now in its fifth edition, is THE main tool for the categorisation of psychiatry. And it is, essentially, the Lanterne of Light (1410), Binsfield’s (1589), Michaelis’ (1613), and Barrett’s (1801) classifications of demons, and “Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data” (2015). These texts are all illustrative of the problems that plague (the demon “Merihem”) data reduction, and can be utilised to illustrate the misconceptions (“Pythius”) and temptations (“Mammon”) of trying to describe complex systems in simple terms:

  • Systematic scope (where does the body end? Pestilence and the hereditary condition)
  • Data resolution, sampling error and measurement bias (what is left unseen)
  • Reliance on surface condition (active psychosis and possession)
  • Validity and mutability of categories (seven sins, three lies)
  • An assumption of normality (psychological utility vs. anomaly)
  • Under-dispersion, overrepresentation, and other distributional limitations (how wide is hell? How dull are its legions?)
  • Interstitial domain and interpolation (everywhere there are faces, pareidolia)
  • Data reduction for communication (lies, damned lies, and statistics)
  • Cultural bias (the redefinition of gender and sexuality)
  • Motive (witch hunts of the 15th and 19th centuries)

Statistics, particularly when deployed systematically, are often misunderstood, but the manner in which they both succeed and fail are easily described in a huge number of contexts. Demonology is just one of the most fun.

The satnav is my shepherd

So this happened recently.

For the TL;DR crowd: an elderly couple got into a car in Rio, thought they were going to the beach, put in the address and the satnav they were using led them to a neighbourhood that had a street by the same name as the one they were looking for. Except this neighbourhood was controlled by a violent drug gang, that shot the woman and rifle butted the man. The woman later died.

Some discussion has emerged about the satnav company being “responsible” for leading these people to the wrong place. If so, what does that mean? If companies are now responsible for the information they give people and the way that information is used, what’s the logical end? Will there come a time of digital ghettoisation, when Google Maps and satnavs send an alert when you stray over a certain digital border, saying “you’re in a bad neighbourhood!” Will there be entire areas that are considered by the corporate entities that run the internet to be “the wrong side of the tracks”? Will real time data sets of crime statistics push and nudge these borders around dynamically throughout every day? Will the borders of the “bad areas” shift at night? What the fuck is a “bad area” anyway?

This seems ridiculous and misguided, but the history of shitty decisions by internet companies makes me regularly hold the bridge of my nose and close my eyes for a moment.

Perhaps a better question to ask is not one of responsibility, but one of geography.

Satnav means that you know where you’re going. You don’t have to ask anyone for directions. You just send your wish out into the ether by typing it in a little box, and *ping*, down comes a magical treasure map that leads you were you need to go. The satnav does its job. If you’re not specific enough, or you don’t know you’ve made a mistake, or there’s some kind of clash because two places have the same name, or or or, then you’re on your own and in the hands of human error (which is the reason satnav was invented in the first place).

However, if forced to stop at a petrol station and find someone to ask, they’ll give you nuanced, localised information such as “There’s a place with the same name that’s a bad fucking neighbourhood, avoid that place.” This is because, of course, Google Maps and satnav do not contain all knowledge.

We assume they do. We put our faith in these devices and services, whisper an incantation and hope our prayer is heard and wish granted in the way we need and expect. The answer is always delivered with a kind of “trust me, I’m a TomTom” certainty that we just accept it at face value. It’s your own personal oracle on the dashboard temple. You will be guided by a loving force with your best interests in mind.

But sometimes, as we have seen, the oracle leads you to your death. At those time the grime of human frailty comes peeking through the chromed techno surface, and this faith – and it is faith! – that runs Uber and guides us around cities and is in every taxi these days seems silly, misplaced, childish; these services don’t have all the answers, even though they are so good at looking like they do.

My alter (Gmail) ego

For a number of years I have had a gmail address for spam-catching purposes. This address included the world “grey”, spelled the British way.

Someone else on the internet has a gmail address with the same name, except hers has the word “gray” spelled the American way.

What this means:  I get a lot of email that is meant for her. This has been ongoing since 2006. At first I used to write the person back and say that I wasn’t who they were looking for, but stopped when it didn’t seem to dissuade them. Every couple of months I get an email or two meant for her, and these make up little glimpses into her life an goings-on that have added up to a hazy picture. Here’s what I know:

  • She grew up about 30 minutes from where I grew up (coincidentally)
  • She is a Jehovah’s Witness
  • She lives at Bethel, the Jehovah’s Witness headquarters in Brooklyn
  • Jehovah’s Witnesses give out Watchtower magazine, and they don’t manage to hand out as many as you might think (like, 4 or 5 for a day’s work, on average)

I get party invitations for her. I started out ignoring them, but now I click on the RSVP link if it sounds like something she’ll enjoy. (These invitations always say “Keep in mind Scriptures when it comes to dress and grooming”, and I am curious to know what exactly this means – are we talking modest dress, or are we talking no mixed fibres?)

Recently I got an email meant for her that mentioned her fiancee. Aw, good for her.

I just stole something and it wasn’t (entirely) my fault

When I was checking through my daily-use-nothing-sensitive gmail inbox I noticed that I had a dropbox invitation for 48GB free and a message from Samsung asking to verify an account. At first I thought that this was a phishing technique I hadn’t seen before, build trust by sending related emails sort of thing, until I looked at the recipient address.
Something many people don’t know about gmail is that it handles names in a particular way. Period delineated names get parsed as if they had no period.
For example forename.surname@gmail.com is exactly the same as forenamesurname@gmail.com

To make things worse when you’re signing up for a new address it won’t tell you that and will allow you to go through the whole process with no warnings. The upshot of this that as a new shiny email address owner you don’t actually get emails, the owner of the address without the periods gets them. That’s what happened to me today.

So I clicked the dropbox link and gained 48GB of free data due to the automated service at the dropbox end.

I’ll admit that I probably shouldn’t have also clicked the account activation link from Samsung but I was curious as to how much information I could get from these two sources. Samsung did send a password reset link but I decided not to follow it as I thought that might be a step too far even in the name of research.

I’ve decided to be a good netizen and report the issue in the hope that it can be resolved and will report back if anything happens.

#update of sorts#

Still no response from any of the players in this sorry tale. I really hope that Muhyiddin Abdul Rahim isn’t too annoyed at his lack of Dropbox space.

I had a read an article from the gmail support forums and the official word is that any gmail address with my username and any number of periods in it is exactly the same as the one without so I guess Google is off the hook. Your move Dropbox/Samsung.

#update of sorts 2#

Dropbox Support got back to me via Twitter and I’ve forwarded them here and also provided them more information so hopefully they’ll be able to get this sorted.

#update 3#
Dropbox messaged me directly via Twitter and told me that they’ve managed to attribute the lost space to the correct user now and have also allowed me to keep the same amount of space for myself too. I don’t think I could have asked for a better resolution than that.