Be privacy concerned when using ChatGPT (and other AI chatbots)

This is blog post #18 in our series on the GDPR.

Don’t tell anything to a chatbot you want to keep private.” [source]

Writing about AI in general and about chatbots specifically is like shooting at a moving target because of the speed of development. However, at Runbox we are always concerned about privacy and must examine the chatbots case in that respect.

Due to its popularity, we have mainly used ChatGPT from OpenAI as the target of our examination. NOTE: ChatGPT and the images from text captions DALL-E are both consumer services from OpenAI.

This blog post is a summary of our findings, leading to advice on how to avoid putting your privacy at risk when using the Natural Language Processing (NLP)-based ChatGPT.

Our examination is based on OpenAIs Privacy Policy, Terms of Use, and FAQ, and a number of documents resulting from hours of Internet browsing.

The blog post consists of two parts: PART I is a summary of our understanding of the technology behind language models in order to grasp the concepts and better understand its implications regarding privacy. In PART II we mainly discuss the relevant privacy issues. It is written as a stand alone piece, and can be read without necessarily have read PART I.

PART I: Generative AI technology

The basics

GPT stands for Generative Pre-trained Transformer, and GPT-3 is a 175 billion parameter language model that can compose fluent original writings in response to a short text prompted by a user. The current version of ChatGPT is built upon GPT-3.5 and GPT-4 from OpenAI.

ChatGPT was launched publicly on November 30, 2022. ChatGPT was released as a freely available research preview, but due to its popularity, OpenAI now operates the service on a freemium model [source].

The GPTs are the result of three main steps: 1) Development and use of the underpinning technology Large Language Models (LLMs), 2) Collection of a very large amount of data/information, and 3) Training of the model.

Let us also keep in mind that all this is possible only because of today’s advancements of computational power.

Language models

A language model is a system which denotes mathematics “converted” to computer programs that predict the next word/words in a sentence, or a complete sentence, based on probabilities. The model is a mathematical representation of the principle that words in a sentence depend of the words that precede them.

Since computers basically can only process numbers (in fact only additions and comparisons), text input to the model (prompts) must be converted to numbers, and likewise the output numbers have to be converted to text (response). Text in this context consists of phrases, single words, or parts of words called tokens.

When prompting a GPT then, your query is converted to tokens (represented by numbers), and used by the transformer where its attention mechanism generates a score matrix that determines how much weight should be put on each word in the input (prompt). This is used to produce the answer to the prompt, using the model’s generative capability – that is to predict the next word in a sentence by selecting relevant information from the pre-processed text with high level of probability of being fluent and similar to human-like text [source].

The learning part of the model is handled by a huge number of parameters representing the weights and also statistical biases for preventing unwanted associations between words. For instance, GPT-3 has 175 billion parameters, and GPT-4 is approximated to have around 1 trillion.

(The label “large in LLM refers to the number of values (parameters) the model can change autonomously as it learns.)

Collecting the data

The texts the GPT model generate stems from OpenAIs scraping of some 500 billion words (in the case of GPT-3, the predecessor for the current version of ChatGPT) systematically from the Internet: books, articles, websites, blogs – all open and available information, from libraries to social media – without any restriction regarding content, copyrights or privacy.

The scraping includes pictures and program codes as well and is filtered resulting in a subset where “bad” websites are excluded

The pre-training process

All that data is fundamental for pre-training the model. This process analyses the huge volume of data (the corpus) for linguistics patterns, vocabulary, grammatic properties etc. in order to assign probabilities to combinations of tokens and combinations of words. The aforementioned transformer architecture is used in the training process, where the attention mechanism makes it possible to capture the dependencies between words independent of their position in a sentence.

The result of the pre-training process is an intermediate stage that has to be fine-tuned to the specific task the model is intended for, for instance providing texts, program code, or translation of speech as response to a prompt. The fine-tuning process uses appropriate task-specific datasets containing examples typically for the task in question, and the weights and parameters are adjusted accordingly.

Of cause, a ChatGPT-response to a prompt is not “burdened” with the ethical, contextual, or other considerations a human will perform. To prevent undesired responses (toxicity, bias, or incorrect information), the fine-tuning process is supervised by humans in order to correct inappropriate or erroneous responses, using prompt-based learning. Here the responses are given a “toxicity” score that incorporates human feedback information [source].

ChatGPT usage training

The learning process continues when response generated following by a user’s prompts is saved and subject to the training process, at least for 30 days, but “forever” if chat history isn’t turned off. In any event it is not possible to delete specific prompts from user history [source], only entire conversations

In the world of AI and LLMs, hallucinations are the word used when responses are like “pulled from thin air”.

OpenAI offers an API that makes it possible for “anyone” to train GPT-n models for domain specific tasks [source], that is to build a customized chatbot. In addition, they have launched a feature that allow GPT-n to “remember” information that otherwise will have to be repeated [source, source].

Takeaways

  • The huge volume of data scraped is obviously a cacophony of contents and qualities that will affect the corpus and so also the probability pattern and the responses produced [source].
  • ChatGPT has limited knowledge of events that have occurred after September 2021, the cutoff date for the data it was trained on [source].
  • The response you get from ChatGPT to your prompt is based on probabilities, and as such you have no guarantee of the validity [source].
  • A prompt starts a conversation, unlike a search engine like DuckDuckGo and Google that gives you a list of websites matching your search query [source].
  • ChatGPT uses information scraped from all over the Internet, without any restrictions regarding content, copyrights, or privacy. However, manual training of a model was introduced to detect harmful content [source]. Violations of copyrights has resulted in lawsuits [source], and also signing of more than 10 000 authors of an open letter to the CEOs of prominent AI companies [source].
  • Your conversation is normally used to train the models that power ChatGPT, unless you specifically opt-out [source].

PART II: Chatbot privacy considerations

The privacy considerations with something like ChatGPT cannot be overstated” [source]

The following introduction is mainly made for readers that have skipped this blog post PART I.

Generative AI systems, such as ChatGPT, use information scraped from all over the Internet, without permissions nor restrictions regarding content, copyrights, or privacy (more on this in PART II). This means that what you have written on social media, blogs, comments on an article online etc. may have been stored and used by AI companies to train their chatbots.

Another source for training of generative AI systems is prompts, that is information from users when asking the chatbot something. What you ask ChatGPT, the sentences you write, and the generated text as well, is “taken care of” by the system and could be available for other users through the answer of their questions/prompts.

However, according to Open AI’s help center article, you can opt-out of training the model, but “opt-in” is obviously default.

So, both the Internet scraping and any personal information included in your prompts can have as result that personal information could turn up in a generated answer to another arbitrary prompt.

This is very problematic for several reasons.

Is Open AI breaching the GDPR?

First, OpenAI (and other scraping of the Internet) never asked for permission to use the collected data, which could contain information that may be used to identify individuals, their location, and all kinds of sensitive information from hundreds of millions of Internet users.

Even if Internet scraping is not prohibited by law, it is ethically problematic because data can be used outside the context in which it was produced, and so can breach contextual integrity, which has de facto been manifested in the EU’s General Data Protection Regulation (GDPR) Article 6, 1 (a) as prerequisite for lawful processing of personal data:

…the data subject has given consent to the processing of his or her personal data for one or more specific purposes

Here language models, like Open AI’s ChatGPT, are in trouble: Personal data can be used for any purpose – a clear violation of Article 6.

Second, there is no procedures given by Open AI for individuals to check if their personal data is stored and thereby can potentially be revealed by arbitrary prompt, and far less can data be deleted by request. This “right to erasure” is set forth in the GDPR Article 17, 1:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay …” on the grounds that “(d) the personal data have been unlawfully processed

It is inherent in language models that data can be processed in ways that are not predictable and presented/stored anywhere, and therefore the “right to be forgotten” is unobtainable.

Third, and without going into details, the GDPR gives the data subjects (individuals) regarding personal data the right to be informed, the right of access, the right to rectification, the right to object, and the right to data portability. It is questionable if generative AI systems can ever accommodate such requirements since an individual’s personal data could be replicated arbitrarily in the system’s huge dataset.

Fourth, Open AI stores all their data, including personal data they collect, one way or another, on servers located in the US. That mean they are subject to the EU-US Data Privacy Framework (see our blog Privacy, GDPR, and Google Analytics – Revisited), and the requirements set there.

To answer the question posed in the headline of this paragraph, Is OpenAI breaching the GDPR?It is very difficult to understand how ChatGPT, and other language models for generative use (Generative AI systems) as well, can ever comply with the GDPR.

What about the privacy regulations in the US?

Contrary to the situation in Europe, there is no federal privacy law in the United States – each state has their own jurisdiction in this area. There are only federal laws such as HIPAA (Health Insurance Portability and Accountability Act) and COPPA (Children’s Online Privacy Protection Act) which regulate the collection and use of personal data categorized as sensitive. However, there are movements towards regulation of personal information in several states as tracked by IAPP (The International Association of Privacy Professionals).

How do OpenAI use data they collect?

When signing up to ChatGPT, you have to agree to OpenAI’s Privacy Policy (PP), and allow them to gather and store a lot of information about you and your browsing habits. Of course, you have to submit all the usual account information, and to allow them to collect your IP-address, browser type, and browser settings.

But you also allow them to automatically collect information about for instance

“… the types of content that you view or engage with, the features you use and the actions you take, as well as your time zone, country, the dates and times of access, user agent and version, type of computer or mobile device, and your computer connection”.

All this data made it possible to build a profile of each user – bare facts, but also more tangible information such as interests, social belongingness, concerns etc. This is similar to what search engines do, but ChatGPT is not a search engine — it is a “conversational” engine and as such is able to “learn” more about you depending on what you submit in a prompt, that is, how you engage with the system. According to their PP and the citation above, that information is collected.

The PP acknowledges that users have certain rights regarding their personal information, with indirect reference to the GDPR, for instance the right to rectification. However, they add:

“Given the technical complexity of how our models work, we may not be able to correct the inaccuracy in every instance.”

OpenAI reserves the right to provide personal information to third parties, generally without notice to the user, so your personal information could be spread to actors in OpenAI’s economic infrastructure and is very difficult to control.

Misuse of your personal information – what are the risks?

It is reasonable to assume that OpenAI will not knowingly and willfully set out to abuse your personal information because they have to adhere to strict regulations such as GDPR, where misuse could result in fines of hundreds of millions of dollars.

The biggest uncertainty is linked to how the system responds to input in combination with the system’s “learning” abilities.

If asked the “right” question, the system can expose personal information, and may combine information about a person, e.g. a person’s name, with characteristics and histories that are untrue, and which may be very unfortunate for that individual. For instance, asking the system something about a person by name, can result in an answer that “transforms” a credit card fraud investigator to be a person adhered to credit card scam.

Takeaways

Using generative AI systems, for example ChatGPT, is like chatting with a “black box” – you never know how the “box” utilizes your input. Likewise, you will never know the sources of the information you get in return. Also, you will never know if the information is correct. You may also receive information about other individuals that you shouldn’t have, potentially even sensitive and confidential information.

Similarly, other individuals chatting with the “box”, may learn about you, your friends, your company etc. The only way to avoid that, is to be very careful when writing your prompts.

That said, OpenAI has introduced some control features in their ChatGPT where you can disable your chat histories through the account settings – however the data is deleted first after 30 days, which means that your data can be used for training ChatGPT in the meantime.

You can object to the processing of your personal data by OpenAI’s models by filling out and submitting the User Content Opt Out Request form or OpenAI Personal Data Removal Request form, if your privacy is covered by the GDPR. However, when they say that they reserve the right “to determine the correct balance of interests, rights, and freedoms and what is in the public interest”, it is an indication of their reluctance to accept your request. The article in Wired is recommended in this regard.

Valuable sources

  1. GPT-3 Overview. History and main concepts (The Hitchhiker’s Guide to GPT3)
  2. GPT-3 technical overview
  3. Transformers – step by step explanation
  4. LMM training and fine-tuning

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Continue Reading →

Privacy, GDPR, and Google Analytics – Revisited

This is blog post #17 in our series on the GDPR.

Summary of the case

In our blog post on 23 October 2022, we referred to the Data Protection Authorities (DPAs) of Austria, Denmark, France, and Italy who were concluding that the use of Google’s Universal Analytics (UA or GA3) is not compliant with the EU’s General Data Processing Regulation (GDPR).

The reason for this is that the use of GA3 implies that personal data is transferred to the US, which at that point in time was not on the EU’s list of countries that have adequate level of protection of personal data. This means that the US was not fulfilling the requirements set by the EU/GDPR regarding ‘the protection of fundamental rights and freedoms of natural persons’, which is a key expression in the GDPR.

Furthermore, the Norwegian DPA (Datatilsynet) had up until 23 October 2022 received one (1) complaint regarding Google Analytics. Before any final decision is made, they have to confer with other supervisory authorities in the EEA that also have received similar complaints, according to GDPR Article 60 (One-Stop-Shop mechanism).

(We regret that links in italics in this article point to web pages in Norwegian.)

Universal Analytics (GA3) replaced by GA4

In October 2020, Google released Google Analytics 4, the new version of Google Analytics. In March 2022, Google announced that the Google Universal Analytics tool will be sunset in July 2023 and that Google would only provide the GA4 tool after 1 July 2023.

The Danish DPA have analyzed the GA4 regarding privacy, and concludes on their website that even if improvements have been made, it is still the case that “law enforcement authorities in the third country can obtain access to additional information that allows the data from Google Analytics to be assigned to a natural person.” That said, GA4 is illegal in terms of the GDPR because servers in the US are involved in the process, as long as an adequacy decision EU/US is not made.

The Norwegian DPA decision

Norwegian DPA reports on their website 27 July 2023 that they have concluded on the complaint mentioned above. The complaint stems from the noyb who lodged it against 101 European websites to the data supervisory authorities in the EEA for the use of GA. One of these was the Norwegian telecom-company Telenor, who at that time was using GA.

The conclusion is that personal data then was transferred to the US in violation of the GDPR, Article 44. In other words, the use of Google Analytics was illegal. Because Telenor discontinued use of GA on January 15, 2021, the Norwegian DPA in a letter on 26 July 2023 finds that a reprimand “to be an adequate and proportionate corrective measure”.

The Norwegian DPA relies on the Danish authority by claiming that the conclusion will be the same regardless of whether Google Analytics 3 or 4 has been used (see above).

What about adequacy EU/US?

On 10 July 2023 the European Commission adopted its adequacy decision for the EU-US Data Privacy Framework and announced a new data transfer pact with the United States.

Accordingly, companies from the EEA area should be able to legally use GA as long as Google enter into a so-called Standard Contractual Clauses that provide data subjects with a number of safeguards and rights in relation to the transfer of personal data to Google LLC (Limited Liability Company) in the US.

However there is a big “but”: Max Schrems at noyb writes: “We have various options for a challenge already in the drawer, …. We currently expect this to be back at the Court of Justice by the beginning of next year. The Court of Justice could then even suspend the new deal while it is reviewing the substance of it.”

To use the same phrase as in the recent update of our blog post On the EU-US data transfer problem: The last words are obviously not said.

Continue Reading →

On the EU-US data transfer problem

This is blog post #16 in our series on the GDPR.

At Runbox we are always concerned about data privacy – “privacy is priceless” – and we put some effort into keeping ourselves updated on how the EU’s General Data Protection Regulation (GDPR) affects privacy related issues.

That’s because we want to be prepared in case something happens within the area that will affect the Runbox organization, our services, and consequently and most important: our customers.

Update 2023-08-06

On 10 July the European Commission adopted its adequacy decision for the EU-US Data Privacy Framework and announced a new data transfer pact with the United States. See the full text here: COMMISSION IMPLEMENTING DECISION.

A flyer from the European Commission is also available, and a summary of the situation is available from Reuters.

The Austrian non-profit organization NOYB, chaired by Maxmillian Schrems, stated:

We now had ‘Harbors’, ‘Umbrellas’, ‘Shields’ and ‘Frameworks’ – but no substantial change in US surveillance law. The press statements of today are almost a literal copy of the ones from the past 23 years. Just announcing that something is ‘new’, ‘robust’ or ‘effective’ does not cut it before the Court of Justice. We would need changes in US surveillance law to make this work – and we simply don’t have it.

We have various options for a challenge already in the drawer, although we are sick and tired of this legal ping-pong. We currently expect this to be back at the Court of Justice by the beginning of next year. The Court of Justice could then even suspend the new deal while it is reviewing the substance of it.” [source]

The last words are obviously not said.

Originally published 2023-03-19

The case of EU-US data transfer is highly relevant because Runbox has an organizational virtual modus operandi, and that this could lead to an opportunity to involve consultants that are residing in the US. We know that many of our customers are as concerned as we are about data privacy, so we believe it is pertinent to share our findings.

In blogpost #15 in our series of the GDPR we referred to the Executive Order signed by US President Joe Biden on 07 October 2022. This happened six months after the US President and the President of the EU Commission Ursula von der Leyen with much publicity signed the Trans-Atlantic Data Privacy Framework on 25 March 2022.

Joe Biden and Ursula von der Leyen at a press conference in Brussels. [Xavier Lejeune/European Commission]

In this blog post we will take a closer look at the Trans-Atlantic Data Privacy Framework, and the process thereafter.

Trans-Atlantic Data Privacy Framework

The objective of the Framework is to (re)establish a legal (with regards to the GDPR) mechanism for transfers of EU personal data to the United States, after two former attempts (Safe Harbour and Privacy Shield) were deemed illegal by the Court of Justice of the European Union (CJEU).

The Framework ascertains United States’ commitment to implement new safeguards to ensure that ‘signals intelligence activities’ (SIGINT, intelligence-gathering by interception of signals) are necessary and proportionate in the pursuit of defined national security objectives. In addition, the Framework commits the US to create a new mechanism for EU individuals to seek redress if they believe they are unlawfully targeted by signals intelligence activities.

Following up the 25 March 2022 Biden–von der Leyen agreement, the US president signed on the 7 October 2022 the Executive Order (EO) ‘Enhancing Safeguards for United States Signals Intelligence Activities’.

US compliance with the GDPR

Subsequently a process was initiated on 13 December 2022 within the EU Commission to assess whether the US, after the implementation of the EO, will meet the requirements qualifying the US to the list of nations that is compliant with the GDPR Article 45 “Transfers on the basis of an adequacy decision”. That is, whether the European Commission has decided that a country outside the EU/EEA offers an adequate level of data protection. To those countries, personal data may be transferred seamlessly, without any further safeguard being necessary, from the EU/EEA.

Inclusion of the US on that list is of course very important, not least for companies like Facebook and Google, and US companies offering cloud-based services as well. The Court of Justice of the European Union (CJEU) has deemed earlier transfer schemes (Safe Harbour and Privacy Shield) illegal, so “the whole world” is waiting for the EU Commission’s adequacy decision.

This came, as a draft, 14 February 2023 where the Commission concludes (page 54) that “… it should be decided that the United States ensures an adequate level of protection within the meaning of Article 45 of Regulation (EU) 2016/679, …)

(The figure below illustrates the “road” for legislative decisions in the EU. A more comprehensive description of the legislative procedure can be found here.)

However, the same day, 14 February 2023, the Committee on Civil Liberties, Justice and Home Affairs of the European Parliament concludes.. the EU-US Data Privacy Framework fails to create actual equivalence in the level of protection; ..” and “..urges the Commission not to adopt the adequacy finding;”.

Incompatible legislative frameworks

There are two important arguments, among others, behind the Commission’s conclusion: 1) There is no federal privacy and data protection legislation in the United States, and 2) the EU and the US have differing definitions of key data protection concepts such as principles of necessity and proportionality (for surveillance activities etc.), as pointed out by the Court of Justice of the European Union (CJEU).

Shortly thereafter, on 28 February 2023, the European Data Protection Board (EDPB) made public their opinion on the decision of the EU Commission regarding the adequacy. The EDPB has some concerns that should be clarified by the Commission, for instance relating to exemptions to the right of access, and the absence of key definitions.

Furthermore, the EDPB remarks that the adequacy decision is only applicable to US organizations which have self-certified, and that the possibility for redress provided to the EU data subjects in case of violation of their rights is not clear. “The EDPB also expresses concerns about the lack of a requirement of prior authorization by an independent authority for the collection of data in bulk under Executive Order 12333, as well as the lack of systematic independent review ex post by a court or an equivalently independent body.”, as stated in Opinion 5/2023.

The next step in the process is voting over the Commissions proposal in the European Parliament, probably in April, and thereafter the adequacy decision must be approved by all member states, before the EU Commission’s final decision.

The Commission may set aside the results of the voting in The Parliament, but one should expect that the critics from The Committee on Civil Liberties, Justice and Home Affairs, and the concerns of EDPB, will impact the implementation of the Framework.

Here it would be prudent to recall the statement made by the Austrian non-profit organization NOYB, chaired by Maxmillian Schrems: “At first sight it seems that the core issues were not solved and it will be back to the CJEU sooner or later.”. This refers to the verdicts of the CJEU (Court of Justice of the European Union) condemning the former frameworks Safe Harbour and Privacy Shield – the verdicts bearing the name Schrems I and Schrems II, respectively.

Bottom Line: The final outcome of the process is unclear, but in any event we have to wait for the final decision of the EU Commission.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought regarding any specific circumstances.

Continue Reading →

The Norwegian Consumer Council’s voice is heard worldwide

The Norwegian Consumer Council (NCC) has taken a strong position against commercial surveillance online, and has made it very visible how the Ad-Tech industry is exploiting personal data for business purposes.

“Big data” has since the entry of social platforms on the Internet, been accumulated and used unscrupulously by some companies for profit. Some of the players in the field are sharing information they collect on users with third party advertisers without their users’ knowledge or consent. The driver is all the money connected to targeted advertising. However, sharing of personal data in this way is prohibited according to the EU’s General Data Protection Regulation (GDPR).

The NCC has no authority to enforce personal data legislation, but the Norwegian Data Protection Authority (NDPA) does. And so, the NCC can freely report findings of breaches of the GDPR and Norwegian data protection regulations to the NDPA.

NCC and NDPA at the forefront

A good illustration of this interaction is the case against Grindr. Earlier this year the NCC, based on the report “Out of Control” (2020), raised the case against Grindr and five Ad-Tech companies that were receiving personal data through the app: Twitter`s MoPub, AT&T’s AppNexus, OpenX, AdColony and Smaato.

All the complaints were filed (in cooperation with the European Center for Digital Rights, noyb.eu), at the NDPA because of violations of the GDPR. The complaints concern Grindr transmitting sensitive personal data as for example group affiliation, sexual orientation, and geographic location, with several other parties without encrypting the traffic.

Even if data is anonymised, such as when third parties operate with their own proprietary identification numbers, it is possible to combine data from various sources with openly available information to produce a picture that can identify an individual.

In January, the NDPA announced a fine of 65 mill NOK (€ 5,8 M or approximately $ 6,5 M) on Grindr. The NCC has also in May this year acted against 8 companies and asked for details of their surveillance through the services Perfect365 and MyDays.

The Norwegian urge to protect personal data was also illustrated in May 2021. Then the NDPA submitted an advance notification of an administrative fine of NOK 25 mio to Disqus Inc. The company does widget tracking, analysing and profiling, and disclosing personal data to third party advertisers, and in doing so violates multiple articles (i.e. Article 6 and Article 7) of the GDPR.

Update on 30 September 2023 on the Grindr case

Grindr appealed the decision to the Norwegian Personal Protection Board, which rejected the appeal in its decision on 29 September 2023, and announced that it upholds the fine. – We have received the decision and cannot comment on it until we have been able to discuss it with our client, says the lawyer who represents Grindr.

The privacy movement grows stronger

All of these cases illustrate the NCC mission, but the NCC is working from a broader perspective: To establish a broad, international movement towards surveillance-based advertising.

This movement got a push with NCC’s seminal report Out of Control (2020), which has received media coverage in more than 70 countries, included the US and Japan (see our previous blog post).

Recently (June 2021), the NCC released another report: Time to ban surveillance-based advertising, with the subtitle The case against commercial surveillance online.

On page 4 there is quite a good summary of what the driving force is:

…today’s dominant model of online advertising is a threat to consumers, democratic societies, the media, and even to advertisers themselves. These issues are significant and serious enough that we believe that it is time to ban these detrimental practices.

In a coalition with more than 60 organizations from Europe and the US, including some 10 consumer organisations and the umbrella organisation BEUC – the European Consumer Organisation – the NCC on June 23 2021 sent an open letter to EU and US policymakers. The letter urges the policymakers to “…take a stand and consider a ban of surveillance-based advertising as part of the Digital Services Act in the EU, and for the U.S. to enact a long overdue federal privacy law.” The coalition is backing up its call with the reports by NCC.

On behalf of NCC, the consumer research company YouGov conducted a survey among a representative selection (internet population) 18 years+ about their attitude to surveillance-based advertising. The result was unambiguous: Only 10% responded positively to the idea of commercial actors collecting information about them online, and only one in five think that ads based on personal information is OK

Runbox has a clear standing against the collection of consumer data and surveillance-based advertising: Our service is ad-free, and we never expose our customers’ data for commercial purposes. We are very strict when law enforcement authorities in Norway or foreign countries request that we disclose data about our customers.

At Runbox we are proud to reside in a country that puts privacy first, and we wholeheartedly support the appeal to ban surveillance-based advertising. Therefore Runbox will annually donate to support noyb.eu, and we have joined the list of individuals supporting the appeal.

Continue Reading →

GDPR implementation part 8: “Personal data” in the EU and the US is not the same

We usually think of “personal data” as a term that contains for instance a person’s full name, home address, email address, telephone number, and date of birth.

These are ordinary data that can obviously identify a specific person. But in the personal data category of linked personal information are also data such as social security number, passport number, and credit card numbers – data that can identify us, and data we usually feel more restrictive about.

Linkable and non-linkable information

But there is another category of data that on its own may not be able to identify a person, but combined with other information could identify, trace, or locate a person. Such data are gender, race, sexual orientation, workplace, employment etc. These are examples of linkable personal information.

Then we have the category non-personally identifiable information. That is data that cannot be used on its own to identify or trace a person, for example IP addresses, cookies, device IDs, and software IDs (non-linkable personal information).

Privacy regulations differ in the EU and the US

Now, we know that there are industries that exist almost under the radar while taking advantage of personal data. For instance, companies in the AdTech and MarTech industry base their business on collecting and trading personal data for targeted advertising and marketing.

Many of these actors try to take protection of personal data seriously, and refer to the rules and regulations for processing personal data. In Europe this is the GDPR (General Data Protection Regulation) within the EU/EEA-area1, and in the US it is the responsibility of the FTC (Federal Trade Commission).

However, what the EU/GDPR and US government agencies mean by “personal data” is different. Specifically, the definition by EU/GDPR is more comprehensive than the definition often referenced by US agencies, such as that of NIST (National Institute of Technology).

For example, the EU concept of personal data includes information such as cookies and IP addresses, which are not considered as personal data in a US setting.2

This means that if US websites in their privacy policy state that they are GDPR compliant, but combine their data with other data sets, they may breach the GDPR. For example, they must have the user’s consent to collect their IP address under the GDPR.

Definitions of “personal data”

National Institute of Technology’s definition

NIST’s definition of personal data is contained in the definition of Personal Identifiable Information (PII):

PII is any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.

US Office of Privacy and Open Government’s definition

Another PII-definition is from the US Office of Privacy and Open Government (OPOG) as follows:

The term personally identifiable information refers to information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.

EU’s GDPR definition

Compare these PII-definitions with the GDPR Article 4(1)’s definition of personal data:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

It is obvious that GDPR defines personal data much broader than both NIST’s and OPOG’s PII, and this is underlined by this statement found in GDPR’s Recital 30:

Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

The US is lacking comprehensive regulation

That said, US authorities are moving towards stronger protection of privacy and personal data, but as late as March 2019, the US Congressional Research Service says:

Despite the increased interest in data protection, the legal paradigms governing the security and privacy of personal data are complex and technical, and lack uniformity at the federal level. The Supreme Court has recognized that the Constitution provides various rights protecting individual privacy, but these rights generally guard only against government intrusions and do little to prevent private actors from abusing personal data online. At the federal statutory level, while there are a number of data protection statutes, they primarily regulate certain industries and subcategories of data. The Federal Trade Commission (FTC) fills in some of the statutory gaps by enforcing the federal prohibition against unfair and deceptive data protection practices. But no single federal law comprehensively regulates the collection and use of personal data (our emphasis).

Conclusion

When US websites claim to follow the rules for processing personal data it is dubious at best, compared to the regulations in the EU/EEA – which the Norwegian legislation is based on and is what Runbox adheres to.

However, it should be mentioned that some US states, for instance California, do classify some anonymous data (i.e. IP-addresses, aliases and account data) as PII.

In addition, as stated in our Privacy Policy, the personal data we ask customers to register in order to use our service is very limited. We are conscious about the trust our customers place in us when they register personal data in our systems, and in return we can demonstrate that we are compliant with the regulations.

Addendum

Above we referred to the AdTech and MarTech industries and their usage of personal data to identify, trace, or locate a person for advertising and marketing purposes. That topic is outside the scope of this blog post, but is absolutely worth writing about in a later post.

1 EEA = European Economic Area, that is the EU and three countries: Iceland, Lichtenstein, and Norway.

2 https://www.forbrukerradet.no/out-of-control/ footnote on page 102.

Continue Reading →

GDPR implementation part 7: Information and Tools for Implementation of Users’ Rights

GDPR

One of the main objectives for the European Union (EU) when they developed the replacement for the Data Protection Directive 95/46 (from 1995), was to expand individual control over the use of personal data.

This can be seen in a broader view as an implementation of the right to one’s private life, as laid down in the European Convention on Human Rights (Article 8). The right to respect for one’s private and family life is also stated in the EU Treaty on Fundamental Rights (Article 7).

Norway has signed both of these agreements, and the Constitution of Norway implements these rights in Article 100 and 102 of the Constitution and in the Norwegian Human Rights Act.

Already in GDPR1 Article 1 we see the connection between the GDPR and especially the Treaty on Fundamental Rights:

This Regulation protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data

Article 1-2 of the GDPR

Observe the expression “rights and freedoms of natural persons“, which is very important throughout the Regulation and is used 31 times in all.

Before we go further into the subject of this post, it is important to state that Norway’s legislation on the processing of personal data was already compliant with the GDPR before the latter was declared as the new framework for the legislation in Norway. The Norwegian Personal Data Act (PDA2), as compliant with the GDPR, tok effect 20 July 2018.

First and foremost, the GDPR states that no processing of personal data shall be done unless the data subject has given consent (Article 6-1, a). Runbox obtains consent to registration of our users’ personal data when they sign up for an account and accept our Terms of Service.

The GDPR (Article 6-1, ff.) allows a controller – that is Runbox in our context – to process personal data when there is a legitimate reason for doing so, i.e. something that is necessary to use our services.

It is an important objective for the GDPR to secure one’s control of one’s own personal data. In this respect, the GDPR has given the data subjects eight fundamental rights (Article 15—17).

When implementing these rights in Runbox, we found that most of those were already there. However, the introduction of the GDPR provided us with a checklist and the opportunity to analyze our status, and to improve our services in this respect.

Our Privacy Policy provides exhaustive information about how we process personal data, but here is an overview of the data subject’s rights, and our implementation of them:

  • The right to access (Article 15): Since Runbox does not collect other types of information than what the users register by themselves, they can easily check which personal data is processed. The data processing is only done in order to process your emails, and optionally your web site and domain name.
  • The right to rectification (Article 16): You may at any time log in to your email account and change your personal information.
  • The right to erasure (‘right to be forgotten’) (Article 17): You may terminate your subscription any time, and your account contents will subsequently be deleted after 6 months. Your personal details data will be deleted after 5 years in accordance with Norwegian accounting regulations. However, you may send a request to dataprotectionofficer@nullrunbox.com for immediate erasure of your account contents.
  • The right to restriction of processing (Article 18): Runbox will never use your personal information for purposes other than providing our services to you, so restrictions are not necessary in our context.
  • The right to be informed (Article 19): Runbox uses your personal information only in order to provide our services to you..
  • The right to data portability (Article 20): In case that you wish to move to another email service provider and export your data, you will find information on how to do this through our services and documentation.
  • The right to object (Article 21): Since we never will use your personal data for other purposes than to deliver the services you have agreed to, this right is implicitly fulfilled.
  • The right to individual decision-making (Article 22): This article is intended to protect data subjects against automated data-processing that might involve profiling them based on personally identifiable information, which is something Runbox doesn’t do.

Regarding questions or concerns about our implementation of the GDPR, customers may use the email address dataprotectionofficer@nullrunbox.com as a direct channel to our appointed Data Protection Officer.

Some final remarks about consent: Runbox uses cookies in order to provide our services, and new users must give express consent to this on our signup page. On this page, and on the Account page once logged in, you may also give/revoke consent to future news and offers from Runbox.

In our next post in this series, we will consider our contractual situation regarding GDPR requirements. Stay tuned.

Footnotes

1. The GDPR means Regulation EU 2016/679 of 27 April 2016 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46 / EC General Data Protection, General Data Processing Regulation. Article refers to Article in the GDPR, unless stated otherwise.

2. The Personal Data Act (the PDA) means the regulations that are currently in force in Norway for the protection of individuals in connection with the processing of personal data, which includes the implementation of GDPR in Norway (2018-07-20).

Continue Reading →

GDPR implementation part 6: Access Control and Permissions

In part 3 of this blog series we described how we mapped the “world” of our operations, including the following components:

  • Server infrastructure, including all servers and other hardware as well as the links between these.
  • Software components that comprise our application stack from the operating system level to the front-end application level.
  • Data networks, including how and where our serves are connected to the Internet, but also the Local Area Network at our premises.
  • Data inventory, i.e. all personal data including customer and employee data, financial records, information about partners/associates, etc.
  • Applications necessary to run the company itself, meaning software that is managerial in nature.

Access control concerns permissions attached to system-related objects. Within each of the components listed above, there may be several sub-objects — servers, software modules, data files, catalogues etc., to which restricted access should be implemented.

Creating an Access Control Table

These objects then form one axis of an Access Control matrix or table (ACT). The other axis of the table include organizational units, broken down into person-related objects, for instance segments or groups, but also individuals, for each unit.

After breaking these objects down to an appropriate level, we attached roles to each of these components. In terms of the GDPR, data processor and data controller are examples of roles to use in this context.

To each of the defined roles, we attached categories of tasks, for instance sysadmin, developer, and support staff tasks.

For our email service systems we found it convenient to structure the system-related objects in 3 main categories:

  • General software.
  • Application software.
  • Personal data.

Within each of these categories there are various numbers of objects, to which access permissions are attached, comprising the Access Control Table for the realm in question. For other realms of our “world” we used a similar approach, resulting in a number of ACTs that implement a principle of least privilege.

With this the groundwork was laid for establishing various mechanisms for implementing the access control regime, in order to secure our most precious pieces of hardware, software, and data.

In our next blog post in this series we will look at Information and Tools for Implementation of Users’ Rights.

Continue Reading →

GDPR implementation part 5: Risk Assessment and Gap Analysis

In previous posts in this blog series we have referred to our main planning document, Rules and Regulations for Information Security Management, or RRISM for short, where our road to GDPR compliance started out. In that document we worked out the structure of the project, based on descriptions and definitions of the various components.

Obviously, risk management has to be taken very seriously, and the RRISM lays the groundwork for how we should handle this aspect of information security. The baseline is that risk management is an essential part of the company’s life, and one that comprises all its assets.

Defining and assessing risks

As usual, we first had to agree upon some definitions, and we found the following to be adequate for our purpose — directly from NIST (National Institute of Standards and Technology):

Risk is the net negative impact of the exercise of a vulnerability, considering both the probability and the impact of occurrence. Risk management is the process of identifying risk, assessing risk, and taking steps to reduce risk to an acceptable level.

Risk is a function of the likelihood of a given threat-source’s exercising a particular potential vulnerability, and the resulting impact of that adverse event on the organization.

In order to assess risks, we first have to identify possible threats that may exploit vulnerabilities in our systems or our organization.

In short: Risk management shall first and foremost have as objective to protect assets that are at potential risk.

Analyzing assets

Then we outlined the methodology we adopted:

  1. Identify the assets that could be at risk.
  2. Identify possible threats and vulnerabilities.
  3. Identify the possible consequences of each potential vulnerability.

Each threat was characterized by probability and criticality which together gives one of four risk levels: Very High (red), High (orange), Medium (yellow), and Low (green). This helped us decide what we should prioritize regarding improvements, measures, and other actions.

Analyzing our assets we actually found more of these than anticipated, grouped in 21 different asset types, ranging from our customer base, general software in use and our own key business systems, through hardware and communication lines, and employees and partners – and more.

Threat, vulnerability, and gap analysis

Then we reviewed the vulnerability potentials (what could go wrong) for each asset and created scenarios for possible consequences if something happened that exploited a vulnerability.

The question raised thereafter was: Do we have the necessary measures and remedies in place to eliminate the potential vulnerabilities, or mitigate the consequences if things went wrong — or is there a gap?

The next step was to find out what actions should be taken in order to close the gaps in cases where we were not satisfied with the situation, and this will be the topic of future blog posts in this series.

Conclusion

Our mantra through this process has been: Threats we can imagine will sooner or later be reality, but never as we expect them to happen, and never where we expect them.

We live in an ever-changing environment, which means that risks have to be monitored continuously, and so our risk assessment and gap analysis is continually evolving as well.

Continue Reading →

GDPR implementation part 3: Mapping our “world”

This is the third post in our series on Runbox’ GDPR implementation.

After having structured our GDPR project, the next piece of necessary groundwork was to map out status on relevant facts about important areas of our business. The reason is that it’s impossible to establish and maintain good security and privacy – and to determine GDPR compliancy — if the “territory” is not clearly described.

The “territory”

The “territory” in question was foremost and first of all,

  • The email service delivery system, that is the Webmail and backend systems and files – the development platform that is used, the components of which the system is built, the dependencies between the components, description of access points etc. – while being well aware of that the GDPR compliancy also includes Privacy of Design requirements.

Other realms that are necessary to describe were for example:

  • The economic system in which the company operates; i.e. mapping out the network of organizations with which our company is involved – including partners, associates, suppliers, financial institutions, government agencies, and so on – in order to serve our customers.
  • Server infrastructure with all physical links and channels, and not the least: All software components.
  • Data networks, including how and where our serves are connected to the Internet, but also the Local Area Network at our premises.
  • Data catalogue, including of course all personal data, that is, what kind of data are registered on customers and also employees and partners/associates as well.
  • Applications of all sorts necessary to run the company – applications that are managerial of nature.

Level of description

One problem encountered is how detailed the descriptions should be. Too many details will make the job unnecessarily big in the first place, followed by a lot of maintenance to keep the documentation current.

We chose to start with a “helicopter view”, to obtain an overview of the different realms with the intention to fine-grain the documentation depending on the requirements of the ultimate goal: To identify areas where privacy and security is of concern, ticking off issues that are well taken care of in light of the GDPR, or followed up with measures to improve the situation to achieve GDPR compliancy.

Of course, the GDPR Implementation Project is not a sequential one, as development projects seldom are. Therefore, from time to time we had to go back and adjust our planning tools when needs arose.

The next blog post in this series will concern our Information Security Policy.

Continue Reading →

Data Privacy Day

January 28th is Data Privacy Day, and was initiated by the Council of Europe in 2007. Since then, many advances to protect individuals’ right to privacy have been made.

The most important of these is the European Union’s General Data Protection Regulation (GDPR) which was implemented on May 25, 2018. Runbox has promoted data privacy for many years, anchored in Norway’s strong privacy legislation.

At Runbox, which is located in the privacy bastion Norway, we believe that privacy is an intrinsic right and that data privacy should be promoted every day of the year.

Your data is safe in the privacy bastion of Norway

We’re pleased that Data Privacy Day highlights this important cause. Many who use the Internet and email services in particular may think they have nothing to hide, not realizing that their data may be analyzed and exploited by corporations and nation states in ways they aren’t aware of and can’t control.

While threats to online privacy around the world are real and must be addressed, we should not be overly alarmed or exaggerate the problem. Therefore we take the opportunity to calmly provide an overview of Norway’s and Runbox’ implementation of data privacy protection.

Norway enforces strong privacy legislation

First of all, Norway has enacted strong legislation regulating the collection, storage, and processing of personal data, mainly in The Personal Data Act.

The first version of Norway’s Personal Data Act was implemented as early as 1978. This was a result of the pioneering work provided by the Department of Private Law at the University of Oslo, where one of the first academic teams within IT and privacy worldwide was established in 1970.

Additionally, the Norwegian Data Protection Authority, an independent authority, facilitates protection of individuals from violation of their right to privacy through processing of their personal data.

For an overview of privacy related regulations in the US, in Europe, and in Norway, and describes how Runbox applies the strong Norwegian privacy regulations in our operations, see this article: Email Privacy Regulations

Runbox enforces a strong Privacy Policy

The Runbox Privacy Policy is the main policy document regulating the privacy protection of account information, account content, and other user data registered via our services.

If you haven’t reviewed our Privacy Policy yet we strongly encourage you to do so as it describes how data are collected and processed while using Runbox, explains what your rights are as a user, and helps you understand what your options are with regards to your privacy.

Runbox is transparent

Runbox believes in transparency and we provide an overview of requests for disclosure of individual customer data that we have received directly from authorities and others.

Our Transparency Report is available online to ensure that Runbox is fully transparent about any disclosure of user data.

Runbox is GDPR compliant

Runbox spent 4 years planning and implementing EU’s General Data Protection Regulation, starting the process as early as 2014.

We divided the activities implementing the GDPR in Runbox into 3 main areas:

  • Internal policies and procedures
  • Partners and contractors
  • Protection of users’ rights

This blog post describes how we did it: GDPR and Updates to our Terms and Policies

Runbox' GDPR Implementation

More information

For more information about Runbox’ commitment to data privacy, we recommend reviewing the Runbox Privacy Commitment.

Continue Reading →