What makes it possible for us to have a web activity profile and how can we guard against it?

Published by Alisha McKerron

Most of us will be aware of web profiling, with the advent of the General Data Protection Regulations (GDPR) and some shocking data breaches – the most infamous being Cambridge Analytica. We have all heard how companies like Facebook and Google can use cookies to follow us around the internet and keep track of what we are interested in. They do this to serve targeted advertising or in some cases even share that data with others without our permission. We will also be aware of cookie banners and privacy notices which disclose, amongst other things, how our personal data is collected and with whom it is shared. But how many of us actually read these things? I suspect not many, given how few of us read websites’ terms of service. (It’s worth looking at Terms of Service; Didn’t Read, if you are one of them.) Perhaps we might feel and behave differently if we had a better understanding of one of the many tools that enable tracking – namely cookies. What are they, and why do they exist?

First party cookies

The cookie – a small often encrypted text file- was invented in 1994 by an employee of Netscape Communications, the same company that made the browser. At the time Netscape was trying to help websites become viable commercial enterprises. One of its employees Lou Montulli, was creating an online shop and he didn’t want to store the contents of the shopping cart on the website’s server. So what he did was store it in the user’s browser until they made their purchase. This proved to be a useful solution as it meant that the server did not need to spend time and money keeping track of everyone’s shopping cart. It also proved to be a useful solution in other instances – for example, in identifying users.

Simply Explained describes how cookies work in their youtube clip:

Let’s imagine we have a website that requires people to log in to see the content of the site. When you log in your browser sends your username and password to the server who verifies them and -if everything checks out- sends you the requested content. However there is a small caveat. The HTTP protocol – which is used to browse the internet- is stateless. That means if you make another request to the same server, it has forgotten who you are and will ask you to log in again. Can you imagine how time consuming this would be to browse around a site like Facebook and having to log in again every time you click on something?  So cookies to the rescue!  You still log into the website, and the server still validates your credentials. If everything checks out, however, the server not only responds with the content but also sends a cookie to your browser. The cookie is then stored on your computer and submitted to the server with every request you make to that website. The cookie contains a unique identifier that allows the server to “remember” who you are and keep you logged in.”

As you can see this type of cookie (known as a first party cookie) is helpful and makes our lives easier.

Browsers

If we are interested in getting under the hood of our web browser then cookies can be explained as follows. When we type in an HTTP address of an online shop we wish to visit, that web page in its entirety is not actually stored on a server ready and waiting to be delivered. In fact each web page that we request is individually created in response to our request. Our web browser submits a request message to the server hosting the website in order to retrieve the webpage. The Hyper Text Transfer Protocol dictates that this request message be submitted in a set way. First must come a method (eg GET) which indicates a desired action to be performed on the identified web resource; next the path of the web resource (/ ….); and then the request header fields. Likewise the protocol dictates that the servers’ response be submitted in a set way: HTTP status code; response header fields; and an optional message body which is used to upload web resources. The relevance of all this is to explain how and at what stage cookies are passed from web browser to server and vice versa.  

If we have not visited the website before, and therefore have never received cookies from this website, and the server wants our browser to store its cookie/s, it includes it/them in a HTTP response header called Set-Cookie.  If we have visited the website before our browser looks to see if it has cookies for the site that have not expired and if it finds cookies it puts the cookies in a request header called Cookie. HTTP headers can be viewed in web development tools that come as browser add-ons or built in features in web browsers. 

Third party cookies

Cookies become a cause for concern when they are used by external servers which the website is relying on to deliver content. Think about what we typically find on websites: images; media; links to YouTube, Twitter, and Facebook; advertisements, Facebook Like buttons etc. In order for our browser to serve up this content, it will send a request to a third party website. When this happens, the external website might place a cookie (called a third party cookie) on our browser (or, to be more precise, it asks the browser to store the cookie). Our browser then would send the information contained in the cookie next time it made a request to that external site – helping that site remember who we are. With the help of the HTTP referer header, a site loaded as a 3rd-party resource will also know which (first-party) website we were visiting. This is not such good news because the third party cookie is enabling our web browsing to be tracked.

Simply Explained goes on to explain how this works using Facebook as an example:

Well, the whole process starts when you log in to Facebook. To remember that you’re logged in, Facebook stores a cookie on your computer, nothing unusual about that, many other sites do the same thing.This cookie is scoped, or bound to Facebook’s domain name, meaning that no one else besides facebook.com can read what’s in the cookie. Let’s now imagine that you browse away and you land on someone’s blog.The blog cannot read your Facebook cookie, and the scope prevents that. Facebook also can’t see that you’re on this blog. All is well.But let’s now assume that the owner of the blog places a Facebook like button on his website. To show this like button, your browser has to download some code from the Facebook servers, and when it’s talking to facebook.com, it sends along the cookie that Facebook set earlier.Facebook now knows who you are and that you visited this blog. I’m using Facebook as the example here, but this technique is used by many other companies to track you around the internet.The trick is simple: convince as many websites as possible to place some of your code on their sites. Facebook has it easy because a lot of people want a like or share button on their website. Google also has an easy job because many websites rely on its advertisement network or on Google Analytics. At this stage, cookies are getting out of hand.”

Unfortunately the information sites can gather by tracking us around the Web in this manner has proved to be quite lucrative. As a consequence there are websites that have capitalised on third party cookies by embedding small digital image files in web pages (called a tracking pixel). The image could be as small as a single pixel, and could be of the same colour as the background, or completely transparent. Although we may not see the image, our web browser will automatically send a request to the external hosting server and so the process described above is triggered.

Guarding against third party cookies

How can we best protect ourselves? The first thing we can do is run a panopticlick test to determine how good a job our web browser is doing in protecting us from tracking. If the results are not as good as we expected, then we should consider installing a browser extension that blocks third party cookies such as Privacy Badger or Ghostery. We could also switch to a browser with built in protection such as Firefox or Safari, or, if we wish to continue using our current browser, ensure that we have blocked third party cookies in our browser settings.

If we don’t want to do anything, the law is on our side. In Europe, we have the GDPR which requires websites to be transparent about their use of cookies and requires sites to offer users simple ways to opt out. We’ve probably seen these annoying cookie banners asking for our permission. Next time we see them, we shouldn’t just click on accept but look at what cookies the website wants to place on our computer and for what purpose. More than ever it is important that we get involved and if necessary enforce our rights- particularly, since, a new study by researchers at MIT, UCL and Aarhus University, has revealed that most cookie consent pop-ups served to internet users in the EU, are likely to be non compliant. We must do this, if not for ourselves, then for the sake of web users.