Deep Learning and GDPR Truth, Identity, Intelligence, Trust

07 November 2022

‘Deep Learning’ is not the latest education fad with the lifespan of an education secretary, et al. The deep learning described here is the name of a ‘new’ technology with significant impact, for better or for worse. An understanding of deep learning is required to fully grasp why personal information is so valuable and how it may be used. It might also help schools to reflect on their assessment processes and highlight the importance of ongoing face-to-face teacher assessment beyond homework and coursework. Deep learning theory and context and real-world usage are highlighted. This article concludes with areas of consideration for leaders.

The background should inform risk management and what might be required of others. People chose cars with airbags and antilock brakes before they became the expected norm. The purchaser could evaluate their value without understanding how they worked. Can you evaluate who and what to use for technology security?

Deep Learning: Theory and context.

There is more hype and misinformation about deep learning than there is in politics, advertising and social media.  I am therefore using written extracts and knowledge from my nephew, who has a PhD in designing hardware for simulating biologically plausible neural networks. Even he does not claim to be an expert in deep learning. Who is?

Intelligence

Good teachers have always known there are multiple intelligences. Artificial Intelligence (AI) in action can result in outcomes indistinguishable from those generated biologically.

AI is a field in which people try to build systems that mechanise processes which might in some way be perceived as 'intelligent'. The system itself need not actually be 'intelligent' or even resemble human processes. Artificial General Intelligence (AGI), such as the android in films, is not here. Yet!

The Babbage mechanical analytical engine was once labelled intelligent, with numbers represented by gear wheels. However, in 1997 IBM's Deep Blue's chess victory against Garry Kasparov is now casually dismissed as 'just game theory in action.' AI is used everywhere, from finance to military strategy.

Can machines learn? The use of vocabulary describing biological learning is unhelpfully intermingled with the language used for the technology. ‘Deep learning’ builds upon a previous ‘machine learning’ technique misleadingly named 'Artificial Neural Networks (ANNs), a technique predating the name 'machine learning' itself.

The 'neural' analogy in ANNs comes from the way that biological neurons take a collection of input signals and combine them in some simple but somewhat arbitrary manner to produce a new output signal. I have found neuroscience studies following brain injury quite complex.

In an ANN, an input, such as the pixel values of an image, may be fed into a bunch of artificial 'neurons' where each 'neuron' will combine those values to produce a new output value. Mathematics is everywhere, with or without the gear wheels.

It turns out that when presented with vast quantities of data, deep networks can pick up on remarkably subtle and infrequent patterns and do so rather robustly. This means that if you can collect a large enough corpus of training data, you can potentially create a model with remarkably sophisticated abilities.

Big data is not a large memory stick or a personal computer. There is a lot of data, including personal human data such as texts, images, sound and video, in storage. With big data, we're talking about tweaking billions of parameters and trillions of training data items from an infinite variety of sources from the web and beyond. Currently, monstrous specialised devices, often called 'tensor' processors, exist. Raw electricity costs alone mean ‘training’; some cutting-edge models have cost millions of dollars!

Emerging models don’t ‘learn’ more after training. They have significant limitations and can’t undertake or manage even the most basic of tasks outside of the model, despite excellent outputs that relate to the mathematical analysis of their training data.

Deep Learning: Significant real-world Impact

‘Deepfakes’ Who are you really communicating with? Images and video manipulation, such as automated photorealistic face-swapping, have developed to include support for face-swapping on videos and even real-time video processing.

Automated response. We all experience automated responses when trying to get help online or by phone. Sometimes you hope that you’ll eventually get through to a natural person who might better understand the requirements.

Generative Pre-trained Transformer 3 (GPT-3) is a deep learning response with high output quality that it can be difficult or even impossible to detect that a human didn't produce it. Applications of GPT-3 are numerous and include document summarisation, grammatical correction, generating advertising copy from product descriptions and so on. There are also more nefarious applications, notably relating to using it to reword and rephrase existing works to produce plausibly original plagiarised documents.

Trust me, the text in the article is not deep learning edited. Do you trust me? Do you know me? Do you know it is me writing? Does it matter?

GitHub Copilot can generate new computer programs for a specific purpose, given snippets or prompts. It is trained by a vast amount of code created for a different purpose. New software may have pockets of old code that are problematic and potentially dangerous. Malware may be included unknowingly.

DALL-E 2, Mid journey and Stable Diffusion. Wave-making models that generate images given textual prompts. The resulting images can range from photorealistic to mimicking the styles of famous artists and can be as sensible or surreal as desired.

There are many deep learning models to try via the web. Many offer subscription services for regular access and content creation.

Questions: Identity, Truth and Trust

Where and what is our identity, and how is it used? Beyond the theft, forgery, and fear of the Terminator, who are the people using this deep learning?

Combined with the prohibitive cost of gathering ‘training’ data and performing training, many cutting-edge ‘models’ are the property of extraordinarily large companies. In the best circumstances, those with power and authority are elected democratically. Large companies?

We might understand and accept swapping an actor’s image with their stunt double. What about filmmakers bringing back characters using personal identities, such as images and voices, from dead actors for new scenes in an ongoing film franchise? Can we intentionally and unintentionally ‘sell’ face, voice and other identity features online? What is the risk to self and others?

“The Capture” drama series on BBC iPlayer highlights some significant and intriguing issues arising from manipulating identity to create adjusted evidence for the law to convict falsely. Some people may believe evidence ‘correction’ to create fake evidence is justifiable for their own purposes.

How might the use of deep learning models distort culture, ethnicity and religion?

Despite the impressiveness of the results and the gushing and fantasising of marketing people, there is nothing magic with deep learning. Whilst you may not understand why a given model works; indeed, many experts would argue we don't actually know keeping yourself grounded is useful.

We can’t freeze time and halt change, but there are things we should do in the present.

Consideration for Leaders

There is a balance between security and user experience. Is your choice of acceptable risk being made by others unknowingly? What are the employment and training implications?

Truth

Where is truth? What is truth? Is it a half-truth, truth? Do we need the truth, the whole truth and nothing but the truth? Should we take more time to distinguish the reality we accept, particularly via technology?

Identity

Protect your personal and special category identity data however it is stored. I offer some basics for GDPR Data Protection Act (DPA) 2018 in relation to technology. Identity includes name, address, email address, telephone number, birthday, records and attendance data, and location, including IP addresses... Special Category identity includes Biometrics, sexual orientation, health, criminal record, ethnicity, safeguarding, trade union membership, political opinion, religion…

In terms of computers, including phones…

Consider browser choice and ensure the security and privacy setting are enhanced. Check for a misspelt web address that trick who you are communicating with. Do you understand how your browser might warn you when you visit a site without Hyper Text Transfer Protocol Secure (HTTPS) where the identity has had some has had some fundamental identity certification.

What search engines do you use? Are searches tracked to target content and advertising to you. Is your information being used for more than advertising?  Are trackers and cookies deleted? Many browsers can do this automatically. Alternative search engines, such as https://duckduckgo.com may be worth checking out.

Text messages are vulnerable. It may be worth considering applications, such as ‘Signal’ private messenger, to encrypt and reduce the risk of text messages being read and stored in the cloud when texting. Signal-to-signal user communication is more secure.

Do emails come from where they suggest? Where do hyperlinks take you? Is it safe to open the attached zip file? The 'click here to win, probably is too good to be true.

Do you use plain text or HTML email? Plain text is probably less attractive and less functional than HTML format. Plain text is arguably more secure. HTML emails may have malware tagged out of view.

What is included on social media? How might text, images and opinions… be used? It may be evaluated as part of a job interview selection process. Are ‘protected characteristics’ being evaluated during the process? This may lead to discrimination. Are the employment processes used to gain staff, Equality Act 2010 compliant?

Is personal location being tracked for the desired purpose? Who can access location identity data used by apps such as Life 360 and satellite navigation systems? Is this data being sold to someone unknown? Are such applications trustworthy?

Are all apps being too trusted during installation by accepting default access to camera, location, keyboard, phone, and physical activity…? Would a less smartphone be better for some activities? New basic telephones can still be purchased.

What is the security software in use on fixed and mobile devices with access to school information, such as email or data stored in the cloud. Malware might monitor activity on the computer and collect passwords -which should be different and complex for each purpose. Do you use three-factor authentication (3FA) or over-rely on just one. Is the security software up to date? Can you trust the ‘source’ code and the origin of the security provider?

Is your Operating System (OS) run with the most recent security update? How close are devices to the update provider? Is there a third-party update delay for the device being used? Is this potential delay considered at the time of the hardware purchase. How long will the latest OS be available after purchase.  

Does your IP address give up the service provider identity and thereby indicate the hardware protecting the local network. Would something more bespoke, beyond the modem, help keep local networks more secure. Is the network and Wi-Fi access protected against unwanted devices by a strong password or Mac address security? Is identity and secure information accessible to all others on the network used at school, home or mobile?

Do you use a Virtual Private Network (VPN) to hide your data and IP address through encryption and rerouting? A consistent static IP address can identify individuals and their activities electronically. An IP address is required to identify each computer on a network to enable communication between devices. Do you always have the same IP address each time you connect to the world wide web?

VPNs should be used for security and not simply to pretend to be elsewhere to access streaming services before their content is available locally.

Are devices shared? Who administers access rights for different users?

Is the identity of the Data Protection Officer (DPO) known, including contact details by phone as well as email and web form? Are data breaches reported appropriately to the UK's Information Commissioners Office (ICO)?

Intelligence

Automated responses may be identifiable through processing delay or lack of model training data, resulting in irrelevant responses to out-of-scope input offered to test. Beware, intelligent real people also provide answers to a different question strategically when under pressure. Just watch the news!

Do systems record your communication, and is the record accurate or a dubious auto-generated tick box of personal information?

Trust

What does a flashing amber light tell you about a vehicle? It tells you that the bulb is working. Anything beyond this needs to be checked to be safe.

I was reticent to offer these suggestions because there is always more that could and should be added. Paper, dialogue, and body language have not been mentioned. Contextually aware, trusted expertise is required.

The weakest link defines the level of security for the whole system. The moment you think you’ve got watertight security, you are too relaxed and vulnerable.

Identity and information are valuable. Look after yourself and others.

David Channon

Previous
Previous

Online Checks During Recruitment: What is Really Needed?

Next
Next

Updated: Advisory Board Meetings Schedule for 2022 - 2023