Privacy and Microdata

Microdata (and other schema.org compatible specs like JSON-LD) is a great standard for marking up data so that machines (e.g. search engine crawlers) can better understand the meaning behind it.

This makes it a potentially amazing tool for improving a page's rankings in the search engines and helps to provide additional context to users there as-well increasing the chances of them clicking through to your site.

But, before we go into the privacy issues, we first need to go into how microdata marks up data.

Microdata is basically a set of attributes which you can set for HTML tags.

itemscope itemtype="http://schema.org/CreativeWork"

For instance, this would specify that the specific HTML Element is a "CreativeWork" which we will go into later.

\<a rel="author">Admin</a>

And this one of the HTML elements nested somewhere inside CreativeWork, it doesn't matter how many nodes deep it is.
Now, you'll probably find that this isn't like other bits of microdata, but it's part of the HTML5 spec, so it's supported anyway.

The normal microdata approach is:
\<a itemprop="author">Admin</a>

One place where this is used here is in Gosora's posts, so that search engines can recognise individual posts as distinct units rather than random bits of text with no discernible structure, just pure information.

Now, to really get into the privacy implications of this, we have to explore another bit of microdata. This one is called Person.

A Person can be used on a linked page from that item or on the spot to give more information about that user.
Possible fields include someone's physical address, date of birth, email address, relationships with other Persons such as their children, their gender and more.

While many sites haven't quite gone as far as allowing people to provide their full address, it is not unusual for users to offer up their city or country, date of birth, gender and profession / interests and to display these publically for any crawler to see.

Normally, this would not be a concern, as crawlers are dumb and it can be somewhat difficult (although, not impossible) to piece information about someone together, however, with microdata it instantly becomes easy for someone to aggregate information on a user and people likely to be that user based on these traits.

This is a terrifying violation of privacy, althugh not the worst we've seen in modern times and it shows how something that was supposed to help the world can easily be twisted, even if developers and administrators have the best intentions at heart.