GDPR implementation part 3: Mapping our “world”

This is the third post in our series on Runbox’ GDPR implementation.

After having structured our GDPR project, the next piece of necessary groundwork was to map out status on relevant facts about important areas of our business. The reason is that it’s impossible to establish and maintain good security and privacy – and to determine GDPR compliancy — if the “territory” is not clearly described.

The “territory”

The “territory” in question was foremost and first of all,

  • The email service delivery system, that is the Webmail and backend systems and files – the development platform that is used, the components of which the system is built, the dependencies between the components, description of access points etc. – while being well aware of that the GDPR compliancy also includes Privacy of Design requirements.

Other realms that are necessary to describe were for example:

  • The economic system in which the company operates; i.e. mapping out the network of organizations with which our company is involved – including partners, associates, suppliers, financial institutions, government agencies, and so on – in order to serve our customers.
  • Server infrastructure with all physical links and channels, and not the least: All software components.
  • Data networks, including how and where our serves are connected to the Internet, but also the Local Area Network at our premises.
  • Data catalogue, including of course all personal data, that is, what kind of data are registered on customers and also employees and partners/associates as well.
  • Applications of all sorts necessary to run the company – applications that are managerial of nature.

Level of description

One problem encountered is how detailed the descriptions should be. Too many details will make the job unnecessarily big in the first place, followed by a lot of maintenance to keep the documentation current.

We chose to start with a “helicopter view”, to obtain an overview of the different realms with the intention to fine-grain the documentation depending on the requirements of the ultimate goal: To identify areas where privacy and security is of concern, ticking off issues that are well taken care of in light of the GDPR, or followed up with measures to improve the situation to achieve GDPR compliancy.

Of course, the GDPR Implementation Project is not a sequential one, as development projects seldom are. Therefore, from time to time we had to go back and adjust our planning tools when needs arose.

The next blog post in this series will concern our Information Security Policy.

Continue Reading →

GDPR implementation part 2: Structuring our GDPR project

As mentioned in our previous blog post about our GDPR project plan, we structured our implementation plan in 14 sub-projects.

In this blog post we’ll take a look at the first of these sub-projects.

Mapping status compared to the Regulation

The foundation for the sub-projects was (of course) the requirements in the GDPR Regulation, which we had mapped in subproject # 1: Compliancy Status Tables mapping Runbox’ status compared to regulations.

In order to prepare ourselves, we did that before the final regulation was decided. We also did this for the requirements from the Norwegian Personal Data Regulation at that point in time.

Of course, the mapping had to be made compliant with the final version of the GDPR after the EU decision in 2016 – and so we did.

Controller and processor

At that point in time, we had our project nicely structured in the 14 sub-projects mentioned above. That was pretty easy, because of the mapping we had done. An important fact in this context, is that Runbox is a controller and a processor as well, depending on the circumstances, according to the GDPR definitions. It was important to be exact about where and when.

Subprojects definitions and delimitations

In the GDPR we found some important points that we had to consider:

  • Our agreement with our main processor, Copyleft Solutions – and what about the agreements with our affiliates, partners and the like? Are confidentiality clauses regarding protection of personal data adequate any longer?
  • Do our Terms of Service and Privacy Policy correspond to the new requirements?
  • What changes have to be done in our systems to fulfill GPDRs requirement regarding customers’ rights?
  • Do we have a systematic documentation of our systems, and what about access control?
  • Does our information security policy cover the necessary elements, and is our risk analysis up to date?
  • What about the processing of personal data we do for internal processing? Obviously it was necessary to take a look into the agreements we have with internal and external personnel.
  • What about the internal control mechanism we have – do they comply?

Those points (and some more) made the foundation for establishing delimitations between each sub-project, which we will continue blogging about in the weeks to come.

Continue Reading →

Runbox’ road to GDPR compliance

How we did it and what we learned on the way

In our blog post May 25, 2018 we described the main areas of Runbox’ GDPR implementation.

On this Data Privacy Day we’d like to update you on our GDPR implementation, how we did it, and what we learned on the way.

There is an enormous amount of information out there describing GDPR content, simple copies of the regulation, some templates of varying quality – and a lot of warnings.

So first of all, let’s recap what the GDPR is.

What is the GDPR, and why did it come about?

In 2012, the European Union (EU) first proposed a set of rules for protection of data inside and outside the EU. An important reason for this decision was a desire to improve the ability for individuals to control data registered about themselves.

In 2016, the GDPR (General Data Protection Regulation) was formally adopted by the European Parliament and the Council of the European Union to take effect for all individuals within the EU and the European Economic Area (EEA).

Runbox’ approach to the GDPR

Runbox' GDPR Implementation

At Runbox, which is located in the privacy bastion Norway and within the EEA, we started the GDPR planning and implementation process as early as 2014.

At that point in time, we had followed the process in the EU about a comprehensive reform of the EU’s 1995 data protection rules. In the spring of 2014, the European Parliament demonstrated strong support for the GDPR proposal set forward by the Article 29 Working Party. (You can find more information about the history of the GDPR in the article The History of the General Data Protection Regulation.) Shortly thereafter, in September 2014, our GDPR Compliancy Project was launched.

We didn’t know at that time when the GDPR would take effect, but we knew the direction – that is: The GDPR was indicated to move in the direction of existing Norwegian privacy regulations, based on Article 29 Working Party documents.

Our GDPR project plan

We structured our implementation project in 14 partly parallel sub-projects, and after the decision by the European Parliament and of the Council by April 27, 2016, we updated our project plan towards the target date May 25, 2018.

We started out mapping exactly our position compared to Article 29 proposal, which in 2015 was replaced by The European Data Protection Board, and then we went ahead to work out our main planning document, Rules and Regulations for Information Security Management.

The groundwork was done, and we proceeded the project towards fulfillment of our obligations regarding privacy under the new legislation, implemented in Norwegian law by July 20, 2018.

We will share more information in forthcoming blog posts, so stay tuned!

Continue Reading →

GDPR and Updates to our Terms and Policies

On May 25, 2018 the European Union’s General Data Protection Regulation (GDPR) takes effect in all countries in the European Economic Area (EEA).

Norway, where Runbox is located, is part of the EEA and is implementing these regulations through its own legislation.

We welcome these new regulations as they greatly strengthen the rights of the individual to digital privacy and security, which we always have promoted and supported.

What is the GDPR?

The GDPR is a set of regulations declaring that the individual should have control over their personal data by specifying how such data may be collected, processed, and stored.

The regulations require that businesses and organizations integrate this right into their business practices through policies, procedures, and technologies that safeguard the users’ privacy.

Important principles are that personal data are processed lawfully, for legitimate purposes, and with explicit consent from the user. This means that your personal data can only be collected with your permission.

The regulation also sets forth a number of rights on the part of users of digital services:

  • The right to transparency about how data is processed.
  • The right to access and information about collected data.
  • The right to rectify stored data.
  • The right to erase data (“right to be forgotten”).
  • The right to restriction of processing.
  • The right to data portability.

GDPR also recognizes the term “privacy by design”, which means that privacy shall be considered in all circumstances when personal data is processed or stored. By also introducing “privacy by default”, GDPR states that appropriate measures must be implemented to ensure that personal data collected is only used for the specific purpose for which the consent is given.

How does Runbox implement the GDPR?

At Runbox we believe that the privacy and security of your data is essential, and that it’s important for you to be aware of your rights and your options when it comes to your personal data.

Runbox has therefore been working on the implementation of the GDPR throughout our organization and our services over the past three years.

The activities that implement the GDPR in Runbox can be divided into 3 main areas:

  • Internal policies and procedures
  • Partners and contractors
  • Protection of users’ rights

The first two areas include documentation of information security management and internal policies and procedures, as well as data processing and confidentiality agreements with our partners, contractors, and staff.

The third area relates directly to you as a Runbox user, and includes the terms and policies that govern your use of our services, how we aim to inform and educate our users about privacy, and how we are implementing tools and utilities that safeguard your privacy rights.

Runbox’ main areas of GDPR implementationRunbox' GDPR Implementation

Revised Terms of Service and Privacy Policy

As part of our GDPR implementation the Runbox Terms of Service and Privacy Policy have been revised:

While the Terms of Service has only been updated with minor changes, the Privacy Policy has been restructured and amended. It provides a comprehensive overview of the policies that govern your privacy as a Runbox user, and describes in an accessible way the types of data Runbox collects in order to responsibly and reliably operate an email service.

It also lays out how user data are processed and stored, how they are being protected, and what rights you have as a user of our services.

It’s important to us that you are informed about your rights and your options with regards to your privacy. We ask that you review the revised terms and policies by May 25, 2018 when they take effect, and invite you to contact us with any questions or concerns.

Continue Reading →

Our path to GDPR compliance — and how it will strengthen the protection of your personal data

Runbox has been focusing on privacy and information security from day one, and have paid attention to the strict Norwegian legislation concerning the processing of personal data ever since.

Norway is a member of European Economic Area (EEA) and as such has to implement certain EU regulations, even if Norway is not a member of the European Union (EU). When the European Parliament and the Council decided new legislation for the protection of personal data, that legislation also applied in Norway and has to be implemented by May 25, 2018.

The legislation, titled General Data Protection Regulation (GDPR), contains rules for how personal data should be processed. Using the terms of GDPR, this includes how, when, and under which conditions, personal data

  • can be collected, processed and stored, which demands explicit consent, and explicit stated purpose;
  • shall be rectified;
  • shall be deleted (right to be forgotten);
  • shall be released to the person that owns the data (right to portability);
  • could be transferred to third parties for processing, where a Data Processing Agreement (DPA) is mandatory;
  • could be transferred to processors outside EEA.

At Runbox we have followed the development of this new EU legislation from the very beginning, and as early as 2014 we initiated a project in order to become GDPR compliant.

As a first step we started developing a planning document which includes detailed plans for making our information security management complete and consistent. The document laid out a number of activities which are now outlined in 15 sub-projects, of which some are completed, and others are in process of being completed.

However, information security is a continuous effort and the sub-projects will give rise to additional activities far beyond the GDPR framework.

We will keep you updated.

Continue Reading →

The FTC’s Report on Big Tech’s Personal Data Overreach: What You Need to Know

The Federal Trade Commission (FTC) has released a report exposing how Big Tech companies are overstepping privacy boundaries in their quest for user data. The report reveals the massive amount of personal information these companies collect, store, and profit from. Often, this is done without clear user consent or transparency. As concerns over data privacy grow, the report highlights the urgent need for stronger regulation and more responsible data practices.

In this blog post, we’ll break down the key findings of the FTC’s report and discuss how this overreach affects your privacy, along with what actions you can take to protect your data.

Key Findings of the FTC’s Report on Big Tech’s Data Practices

The FTC’s report, titled “A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services” offers an in-depth look at how major technology companies, including Facebook (Meta), Google, Amazon, and others, are handling your personal data. Below are some of the major findings:

(more…)

Continue Reading →

Protect Your Data with Runbox: Secure, Private, and Based in Norway

We live in an era of constant digital surveillance, where governments and corporations collect vast amounts of personal data. Privacy has become one of the most pressing issues for people around the world. From targeted ads to government surveillance programs, personal information is constantly at risk. Protecting that privacy is not just a matter of convenience — it’s essential to safeguarding our freedoms, security, and autonomy. Runbox’s base in Norway plays a pivotal role in safeguarding your personal information.

Privacy is a Fundamental Right

At its core, privacy is about freedom. In the U.S, it is enshrined in the Constitution, particularly the Fourth Amendment, which protects citizens from unreasonable searches and seizures by the government. Although the right to privacy isn’t explicitly stated in the Constitution, it’s been interpreted by courts to protect citizens from government intrusion into personal matters.

The idea that individuals have the right to keep their lives private — free from government or corporate interference — is a cornerstone of American democracy. Without privacy, individuals lose the ability to express themselves freely, to think and act without fear of surveillance or judgment. Privacy is integral to a free society, where people can associate with others, speak their minds, and live without being monitored at all times.

(more…)

Continue Reading →

EU’s AI Act: What it is all about

From the perspective of the general public and society as a whole, so-called Artificial Intelligence (AI) was largely invisible until OpenAI removed the veil over GPT-3.5/ChatGPT in 2022.

Since then, the interest and use of AI, and General Purpose AI (GPAI), has exploded. AI implementations are creeping in everywhere, to great benefit in many respects. However, the warning signs are many — manipulated images, fabricated conversations, falsified news stories, and fake video-presented events can lead to unforeseeable negative consequences, for instance in influencing democratic elections.

This is also the case when AI is used to make decisions, since we know that there is always a risk of “AI hallucinations” where AI software produces incorrect or misleading information.

This simplified outline can serve as background for the EU’s AI Act, whose purpose is to put a societal control over the use and influence of AI/GPAI in particular, and the big tech companies in general.

So let’s dig into the matter, with the aim to describe what EU AI Act is, and to clarify its consequences, if any, for Runbox.

(more…)

Continue Reading →

The Grindr case illustrates how Norwegian authorities fight against misuse of personal information

Oslo District Court has found Grindr’s sharing of personal data illegal as a result of the Norwegian Consumer Council complaint from 2020. Accordingly, Grindr has to pay EUR 5 million, as fined by the Council.

Our guardians of personal data and privacy: NDPA, NPAB, and NCC

As we have written multiple times in our blog series about GDPR and consequences of this EU-regulation, Norway has a long history of protecting citizens’ personal information. It started out with the first Personal Data Act implemented in 1978 with the purpose of protecting the individual against privacy being violated through the processing of personal data. The law was updated with GDPR clauses in the year 2000.

In 1980, the Norwegian Data Protection Authority (NDPA) was established as an independent authority whose task is to monitor compliance with the Personal Data Act. It is important to note that the NDPA has two roles: supervisory authority and ombudsman.

The NDPA decisions may be appealed to NPAB, Norwegian Privacy Appeals Board (Personvern­nemda), whose decisions are final.

During recent years, another Norwegian governmental public body, the Norwegian Consumer Council (NCC), whose role is to protect consumers’ interests, has become involved in privacy, more precisely the misuse of personal data that big tech companies are involved in. As a governmental-independent agency, the NCC is free to chose the cases they want to work on.

Sharing of personal data is illegal without specific consent: The Grindr case

Recently, the NCC has put effort into the task of preventing the big tech companies from using personal information for surveillance-based marketing that the users have not consented to. Neither have users given consent to how personal data is transmitted to the companies’ partners.

The figure below, from https://noyb.eu/en/eu-58-million-fine-grindr-confirmed, illustrates the problem.

(more…)

Continue Reading →

Be privacy concerned when using ChatGPT (and other AI chatbots)

This is blog post #18 in our series on the GDPR.

Don’t tell anything to a chatbot you want to keep private.” [source]

Writing about AI in general and about chatbots specifically is like shooting at a moving target because of the speed of development. However, at Runbox we are always concerned about privacy and must examine the chatbots case in that respect.

Due to its popularity, we have mainly used ChatGPT from OpenAI as the target of our examination. NOTE: ChatGPT and the images from text captions DALL-E are both consumer services from OpenAI.

This blog post is a summary of our findings, leading to advice on how to avoid putting your privacy at risk when using the Natural Language Processing (NLP)-based ChatGPT.

Our examination is based on OpenAIs Privacy Policy, Terms of Use, and FAQ, and a number of documents resulting from hours of Internet browsing.

The blog post consists of two parts: PART I is a summary of our understanding of the technology behind language models in order to grasp the concepts and better understand its implications regarding privacy. In PART II we mainly discuss the relevant privacy issues. It is written as a stand alone piece, and can be read without necessarily have read PART I.

PART I: Generative AI technology

The basics

GPT stands for Generative Pre-trained Transformer, and GPT-3 is a 175 billion parameter language model that can compose fluent original writings in response to a short text prompted by a user. The current version of ChatGPT is built upon GPT-3.5 and GPT-4 from OpenAI.

ChatGPT was launched publicly on November 30, 2022. ChatGPT was released as a freely available research preview, but due to its popularity, OpenAI now operates the service on a freemium model [source].

The GPTs are the result of three main steps: 1) Development and use of the underpinning technology Large Language Models (LLMs), 2) Collection of a very large amount of data/information, and 3) Training of the model.

Let us also keep in mind that all this is possible only because of today’s advancements of computational power.

Language models

A language model is a system which denotes mathematics “converted” to computer programs that predict the next word/words in a sentence, or a complete sentence, based on probabilities. The model is a mathematical representation of the principle that words in a sentence depend of the words that precede them.

Since computers basically can only process numbers (in fact only additions and comparisons), text input to the model (prompts) must be converted to numbers, and likewise the output numbers have to be converted to text (response). Text in this context consists of phrases, single words, or parts of words called tokens.

When prompting a GPT then, your query is converted to tokens (represented by numbers), and used by the transformer where its attention mechanism generates a score matrix that determines how much weight should be put on each word in the input (prompt). This is used to produce the answer to the prompt, using the model’s generative capability – that is to predict the next word in a sentence by selecting relevant information from the pre-processed text with high level of probability of being fluent and similar to human-like text [source].

The learning part of the model is handled by a huge number of parameters representing the weights and also statistical biases for preventing unwanted associations between words. For instance, GPT-3 has 175 billion parameters, and GPT-4 is approximated to have around 1 trillion.

(The label “large in LLM refers to the number of values (parameters) the model can change autonomously as it learns.)

Collecting the data

The texts the GPT model generate stems from OpenAIs scraping of some 500 billion words (in the case of GPT-3, the predecessor for the current version of ChatGPT) systematically from the Internet: books, articles, websites, blogs – all open and available information, from libraries to social media – without any restriction regarding content, copyrights or privacy.

The scraping includes pictures and program codes as well and is filtered resulting in a subset where “bad” websites are excluded

The pre-training process

All that data is fundamental for pre-training the model. This process analyses the huge volume of data (the corpus) for linguistics patterns, vocabulary, grammatic properties etc. in order to assign probabilities to combinations of tokens and combinations of words. The aforementioned transformer architecture is used in the training process, where the attention mechanism makes it possible to capture the dependencies between words independent of their position in a sentence.

The result of the pre-training process is an intermediate stage that has to be fine-tuned to the specific task the model is intended for, for instance providing texts, program code, or translation of speech as response to a prompt. The fine-tuning process uses appropriate task-specific datasets containing examples typically for the task in question, and the weights and parameters are adjusted accordingly.

Of cause, a ChatGPT-response to a prompt is not “burdened” with the ethical, contextual, or other considerations a human will perform. To prevent undesired responses (toxicity, bias, or incorrect information), the fine-tuning process is supervised by humans in order to correct inappropriate or erroneous responses, using prompt-based learning. Here the responses are given a “toxicity” score that incorporates human feedback information [source].

ChatGPT usage training

The learning process continues when response generated following by a user’s prompts is saved and subject to the training process, at least for 30 days, but “forever” if chat history isn’t turned off. In any event it is not possible to delete specific prompts from user history [source], only entire conversations

In the world of AI and LLMs, hallucinations are the word used when responses are like “pulled from thin air”.

OpenAI offers an API that makes it possible for “anyone” to train GPT-n models for domain specific tasks [source], that is to build a customized chatbot. In addition, they have launched a feature that allow GPT-n to “remember” information that otherwise will have to be repeated [source, source].

Takeaways

  • The huge volume of data scraped is obviously a cacophony of contents and qualities that will affect the corpus and so also the probability pattern and the responses produced [source].
  • ChatGPT has limited knowledge of events that have occurred after September 2021, the cutoff date for the data it was trained on [source].
  • The response you get from ChatGPT to your prompt is based on probabilities, and as such you have no guarantee of the validity [source].
  • A prompt starts a conversation, unlike a search engine like DuckDuckGo and Google that gives you a list of websites matching your search query [source].
  • ChatGPT uses information scraped from all over the Internet, without any restrictions regarding content, copyrights, or privacy. However, manual training of a model was introduced to detect harmful content [source]. Violations of copyrights has resulted in lawsuits [source], and also signing of more than 10 000 authors of an open letter to the CEOs of prominent AI companies [source].
  • Your conversation is normally used to train the models that power ChatGPT, unless you specifically opt-out [source].

PART II: Chatbot privacy considerations

The privacy considerations with something like ChatGPT cannot be overstated” [source]

The following introduction is mainly made for readers that have skipped this blog post PART I.

Generative AI systems, such as ChatGPT, use information scraped from all over the Internet, without permissions nor restrictions regarding content, copyrights, or privacy (more on this in PART II). This means that what you have written on social media, blogs, comments on an article online etc. may have been stored and used by AI companies to train their chatbots.

Another source for training of generative AI systems is prompts, that is information from users when asking the chatbot something. What you ask ChatGPT, the sentences you write, and the generated text as well, is “taken care of” by the system and could be available for other users through the answer of their questions/prompts.

However, according to Open AI’s help center article, you can opt-out of training the model, but “opt-in” is obviously default.

So, both the Internet scraping and any personal information included in your prompts can have as result that personal information could turn up in a generated answer to another arbitrary prompt.

This is very problematic for several reasons.

Is Open AI breaching the GDPR?

First, OpenAI (and other scraping of the Internet) never asked for permission to use the collected data, which could contain information that may be used to identify individuals, their location, and all kinds of sensitive information from hundreds of millions of Internet users.

Even if Internet scraping is not prohibited by law, it is ethically problematic because data can be used outside the context in which it was produced, and so can breach contextual integrity, which has de facto been manifested in the EU’s General Data Protection Regulation (GDPR) Article 6, 1 (a) as prerequisite for lawful processing of personal data:

…the data subject has given consent to the processing of his or her personal data for one or more specific purposes

Here language models, like Open AI’s ChatGPT, are in trouble: Personal data can be used for any purpose – a clear violation of Article 6.

Second, there is no procedures given by Open AI for individuals to check if their personal data is stored and thereby can potentially be revealed by arbitrary prompt, and far less can data be deleted by request. This “right to erasure” is set forth in the GDPR Article 17, 1:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay …” on the grounds that “(d) the personal data have been unlawfully processed

It is inherent in language models that data can be processed in ways that are not predictable and presented/stored anywhere, and therefore the “right to be forgotten” is unobtainable.

Third, and without going into details, the GDPR gives the data subjects (individuals) regarding personal data the right to be informed, the right of access, the right to rectification, the right to object, and the right to data portability. It is questionable if generative AI systems can ever accommodate such requirements since an individual’s personal data could be replicated arbitrarily in the system’s huge dataset.

Fourth, Open AI stores all their data, including personal data they collect, one way or another, on servers located in the US. That mean they are subject to the EU-US Data Privacy Framework (see our blog Privacy, GDPR, and Google Analytics – Revisited), and the requirements set there.

To answer the question posed in the headline of this paragraph, Is OpenAI breaching the GDPR?It is very difficult to understand how ChatGPT, and other language models for generative use (Generative AI systems) as well, can ever comply with the GDPR.

What about the privacy regulations in the US?

Contrary to the situation in Europe, there is no federal privacy law in the United States – each state has their own jurisdiction in this area. There are only federal laws such as HIPAA (Health Insurance Portability and Accountability Act) and COPPA (Children’s Online Privacy Protection Act) which regulate the collection and use of personal data categorized as sensitive. However, there are movements towards regulation of personal information in several states as tracked by IAPP (The International Association of Privacy Professionals).

How do OpenAI use data they collect?

When signing up to ChatGPT, you have to agree to OpenAI’s Privacy Policy (PP), and allow them to gather and store a lot of information about you and your browsing habits. Of course, you have to submit all the usual account information, and to allow them to collect your IP-address, browser type, and browser settings.

But you also allow them to automatically collect information about for instance

“… the types of content that you view or engage with, the features you use and the actions you take, as well as your time zone, country, the dates and times of access, user agent and version, type of computer or mobile device, and your computer connection”.

All this data made it possible to build a profile of each user – bare facts, but also more tangible information such as interests, social belongingness, concerns etc. This is similar to what search engines do, but ChatGPT is not a search engine — it is a “conversational” engine and as such is able to “learn” more about you depending on what you submit in a prompt, that is, how you engage with the system. According to their PP and the citation above, that information is collected.

The PP acknowledges that users have certain rights regarding their personal information, with indirect reference to the GDPR, for instance the right to rectification. However, they add:

“Given the technical complexity of how our models work, we may not be able to correct the inaccuracy in every instance.”

OpenAI reserves the right to provide personal information to third parties, generally without notice to the user, so your personal information could be spread to actors in OpenAI’s economic infrastructure and is very difficult to control.

Misuse of your personal information – what are the risks?

It is reasonable to assume that OpenAI will not knowingly and willfully set out to abuse your personal information because they have to adhere to strict regulations such as GDPR, where misuse could result in fines of hundreds of millions of dollars.

The biggest uncertainty is linked to how the system responds to input in combination with the system’s “learning” abilities.

If asked the “right” question, the system can expose personal information, and may combine information about a person, e.g. a person’s name, with characteristics and histories that are untrue, and which may be very unfortunate for that individual. For instance, asking the system something about a person by name, can result in an answer that “transforms” a credit card fraud investigator to be a person adhered to credit card scam.

Takeaways

Using generative AI systems, for example ChatGPT, is like chatting with a “black box” – you never know how the “box” utilizes your input. Likewise, you will never know the sources of the information you get in return. Also, you will never know if the information is correct. You may also receive information about other individuals that you shouldn’t have, potentially even sensitive and confidential information.

Similarly, other individuals chatting with the “box”, may learn about you, your friends, your company etc. The only way to avoid that, is to be very careful when writing your prompts.

That said, OpenAI has introduced some control features in their ChatGPT where you can disable your chat histories through the account settings – however the data is deleted first after 30 days, which means that your data can be used for training ChatGPT in the meantime.

You can object to the processing of your personal data by OpenAI’s models by filling out and submitting the User Content Opt Out Request form or OpenAI Personal Data Removal Request form, if your privacy is covered by the GDPR. However, when they say that they reserve the right “to determine the correct balance of interests, rights, and freedoms and what is in the public interest”, it is an indication of their reluctance to accept your request. The article in Wired is recommended in this regard.

Valuable sources

  1. GPT-3 Overview. History and main concepts (The Hitchhiker’s Guide to GPT3)
  2. GPT-3 technical overview
  3. Transformers – step by step explanation
  4. LMM training and fine-tuning

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Continue Reading →