Skip to content

The End of Online Anonymity

June 10, 2019

Since 2015, I’ve been writing about the impact that machine learning will have on our society. One of the most concerning possibilities, in my mind, was always the potential abuse of these technologies by malicious actors to manipulate or scam people, either through subtle means or by impersonating those they trust.

Today, this concern is very much mainstream: “fake news” has become a buzzword and a kind of modern-day boogeyman. For the most part, I think most people aren’t overly worried. We know that there already are malicious actors creating sketchy content and putting it out there, but most of it seems obviously fake if you examine it more closely. We all assume that we will always be smart enough to tell real from fake, and carry on.

Media manipulation is nothing new. Attempts to control public discourse and influence the masses predate the internet, TV, newspapers and the printing press. What’s about to change is that now, with machine learning, it will become possible to turn electricity into millions of voices relentlessly spreading your gospel to every corner of the internet. At this point in time, it seems most of the fake content out there is not generated using machine learning, it’s created by human beings using puppet accounts. For the most part, someone still has to turn the crank. That limits how much content can be created and how many sources it can come from.

Personally, I’m not just worried about manipulative articles being passed as news. I’m also worried about the impact that networks of malicious bots will have on online communities. We’re still fairly far from being at the point where we can automatically generate news articles that appear convincing upon close inspection, but what about online comment threads? How difficult is it to build a bot that can write convincing one or two sentences comments?

Yesterday, I stumbled upon a link to a subreddit populated by bots based on OpenAIs GPT-2 text generation model. The result is certainly funny, but also leaves me feeling uncomfortable. Yes, much of the content is obviously fake, but many of the comments are actually believable. If you feel unimpressed, you should keep in mind that this is an individual’s side project that repurposed an existing neural network. As it is, the GPT-2 model simply generates text and completes a sentence. It’s an impressive and amusing tech demo, but not something you can easily control. In order to weaponize GPT-2, a malicious actor would need to add some kind of a guidance system: a way to condition text output of the model so as to spread a specific message.

The solution to the fake content problem may seem obvious: we can fight fire with fire, and build machine learning tools to detect machine-generated content. Tools like this are already in the works. Grover boasts 92% accuracy in detecting fake content. The sad reality, however, is that this is an arms race, and it’s not clear at all that this is something we can win. Facebook already employs thousands of human agents for the purpose of detecting and flagging malicious actors, and these people do flag a large volume of content, but they are struggling to keep up. As technology improves, fake content will become harder and harder to tell apart from real content. Manual content verification won’t be able to keep up with the volume, and automated filtering systems will fail.

In my opinion, there is only one effective way to stop fake content, and this is to verify that everyone who posts content is in fact human. You could ask people to upload pictures of themselves, but we’re already at the point where we can produce realistic images of imaginary people using GANs. Any counter-measure of this form will inevitably be defeated. Ultimately, an unfortunate possibility is that online platforms will begin requiring a verified government ID in order to register. We could even end up living in a dystopian world where a kind of “e-passport”, crypto-signed government ID is attached to your every internet connection, and tracked everywhere online, which is very sad to think about.

The rise of bots could render many online communities simply uninhabitable.  Large websites such as Facebook and reddit may have some hope of policing content, but smaller independent players likely won’t have the resources. We are moving towards a model where the internet is dominated by a few centralized content providers and their walled gardens, and generated content may unfortunately make it even harder for grassroots online communities to survive and grow.

I don’t want to see online anonymity be taken away. I hope there is a way to build a new web, a new kind of social media using a hash graph to implement a decentralized web of trust, something that can allow content verification without forcing everyone to sacrifice their right to remain anonymous online. I certainly think it’s a problem that’s worth thinking about and I hope to see more research in that direction, because unless we can come up with a technological solution, a regulatory solution may be imposed onto us, and it will inevitably favor the big players at the expense of the small.

EDIT 2022-02-06: edited this post to make it crystal clear that I am advocating against centralized online IDs, not in favor of them. This post is meant to read as a warning, not an endorsement.

10 Comments
  1. Mike S. permalink

    I think requiring a government-issued ID to participate opens its own can of worms. It erodes individual privacy even further. It also still permits the government itself to create propaganda AI bots by abusing the ID system – so a random business or media outlet might not be able to spam the network, but the government still can.

    Last month I set up my own Mastodon instance. You’re probably aware of it, it’s a free software / open source federated Twitter alternative. Many Mastodon servers have open registration, just like Facebook, Twitter, and so forth. But many have switched to invitation-only and that’s how I set my instance up. I think a shift to that model might be inevitable. I don’t think it’s a complete solution, but I’m at a loss to conceptualize anything better.

    • > I think requiring a government-issued ID to participate opens its own can of worms.

      It’s not a solution that I like or want, it’s the kind of solution that regulators might come up with in a “think of the children!” reactionary move.

      I hadn’t heard of Mastodon. I like the decentralized aspect, but it also sounds like way more of a free for all. I guess you can police your own small community yourself, but when it comes to the intersection of these small communities, I’m somewhat skeptical?

      • Mike S. permalink

        Oh, I see what you mean about having use of real identities imposed by authorities. I agree it’s a real possibility. I had misunderstood your post to mean you thought it was a good idea.

        With Mastodon the convention is that when you host an instance you post a Code of Conduct and enforce it. When a site comes up without one or with one that allows certain forms of discrimination a lot of the other sites do a whole-server block on that domain immediately. That won’t scale forever, if Mastodon gets big enough I’m sure the spam and bigot groups will create thousands of instances programmatically to spew advertisements and hatred and it will be impractical to manually block them all. At that point it would make sense to switch to domain whitelisting. But so far my signal to noise ratio is really good.

  2. Brian permalink

    Probably a little bit on the dark side, think reality will be a little better!

    Firstly there will always be sites that actually do journalism, albeit with bias views, BBC,Al Jazeera. even Fox News and RT often have some level of truth. If you follow enough of them you will get a reasonable feel for the truth.

    More importantly truth and ‘reality’ are often a matter of the writer and readers perspective bot or no bot

    Secondly future generations(hopefully) will be better schooled in evaluating information and its reliability, Generations x, y and z seem to be more gullible with each generation, as technology becomes more and more part of their upbringing, they need to be taught to be less trusting!

    Although probably all this might be irrelevant after the machines take over…….

    Of course we still have the option off the ‘OFF’ switch, till Apple/Google/Microsoft decide to remove it!

  3. Go to the source of information instead of stopping with just the article / forum post / etc. Once you get to the source, you can evaluate it and remember that the info came from THERE.

    If you’re so gullible as to believe some random snippet of info online (and we all are at some point because trust is a necessity for human communities), then you’re likely to pick up incorrect info anyways, bot or no bot.

    Here’s a thought: if the number of bots increases for one point of view, don’t you think there would be roughly the same number arguing for the opposite point of view?

    I disagree with Brian: truth are reality are NOT often a matter of perspective. But I think his opinion stems from considering that most “news” has turned into opinion: X is good, Y is bad. Says who? Depends on who you read.

    • brian permalink

      “truth are reality are NOT often a matter of perspective.”

      You don’t have to spend too long looking at how statistics are used for political ends to know truth and reality is a matter of perspective :)

  4. Anonymous permalink

    The only real solution that I know of are WoTs because they provide enumeration of goodness as opposed to enumeration of badness, which is what filters do. But the main thing is that the vast majority of human-generated content is junk and large communities typically suck one way or another. So you don’t even need a solution that would scale. Quality over quantity.

  5. Jenia Jitsev permalink

    E-Pass was always a solution, even before the rise of non-human systems that can generate fake content. There were a lot of human or human organizations producing fake stuff on malicious intent long before (e.g the infamous content fabrics in St. Petersburg), and it would have been a great countermeasure to require an unique ID for spreading content in masses on internet platforms. The usual objection is of course the privacy issue, especially in the states with eroding rule of law. The solution to that would be for instance creation of an international body, say within UN, that issues that kind of IDs but is not transferring personal data to state organs that are known to be corrupted. For me one thing is clear – without a form of ID attached to an existing person, internet and the content will indeed soon become an incomprehensible place like you have described. As it is clear, that generated content and systems behind it will become indistinguishable from what human produce in max 10-15 years.

Leave a comment