AI Data Entry Point Vulnerabilities (AI-DEPV) are vulnerabilities and attacks associated with the data, where the adversary manipulates the data in order to attack, alter, or otherwise corrupt the intended purpose, scope, context, and nature of the algorithmic system.
AI-DEPV primarily concerns: (1) attacks and defenses on AI systems, and (2) privacy issues which are consequences of attacks on AI algorithms. This knowledge store currently does not cover general cybersecurity issues (e.g., access control, or cryptographic solutions).
Although the knowledge store currently focuses heavily on ML/classification, yet many attacks also apply unsupervised, reinforcement learning or data mining or modeling. Finally, the field of AI security is continuously evolving, and conclusive evidence in terms of defenses is not yet reached. We thus report the current state of knowledge.
Examples of AI-DEPV include, but are not limited to:
- Attacks at Training time:
- Poisoning or Causative Attacks
- Data Poisoning: altering training data prior to model training (e.g., Tay chatbot poisoning)
- Data Injection: inserting corrupt data into the training set
- Data or Label Manipulation: altering features or labels of the existing training set
- Logic Corruption: tampering with the learning algorithm to cause a model to misbehave
- Backdooring or Trojaning Attacks: invoking a model’s secret behaviors while retaining its existing ones
- Data Poisoning: altering training data prior to model training (e.g., Tay chatbot poisoning)
- Poisoning or Causative Attacks
- Attacks at Inference time:
- Model Evasion or Adversarial Examples: bypassing the detection of a model by causing the model to misclassify (e.g., clothing bypassing facial recognition)
- Inference Attacks
- Membership Inference: inferring if a sample data point is in the training set (e.g., aggregated survey data)
- Attribute Inference: inferring if a sample data point has a certain attribute
- Property Inference: inferring if the training set has a certain property (e.g., class imbalance)
- Model Inversion or Data Reconstruction: reconstructing an input sample from approximated gradients
- Data Extraction: recovering training examples by querying the model, specific to language models
- Model Extraction: extracting a model’s architecture, hyperparameters, and/or weights (e.g., GPT-2 replication)
- Model Stealing: replicating the model without the owner’s consent
- Adversarial Reprogramming: altering test data with a pattern that repurposes the classifier (e.g., change digit classifier to letter classifier)
- Spamming: spamming model with chaff data, resulting in incorrect inferences and slowing down model runtime
- User-Executed Unsafe ML Artifact: developing unsafe ML artifacts causing deleterious effect upon executed (e.g., loading a model)
- Attacks at Training and/or Inference time:
- Sloth Attack: increasing runtime of a model using specificunusual inputs at test or training time
- ML Artifact Collection: collecting ML artifacts on the network to stage an attack
In addition to these attacks which are specific to ML systems, traditional attacks of non-ML systems such as Account Manipulation, Valid Account, Phishing, Trusted Relationship can also be exploited to gain access to training data and models. See AI-DEPV – Attacks for a list of all relevant attacks and references.
AI-DEPVs are diverse with regard to: harm (e.g., misclassification, denial of service), method of attack (e.g., shadow training, adversarial perturbation), knowledge of the attacker (e.g., black-box, white-box), model task (e.g., image recognition, language model), type of model (e.g., decision tree, SVM), model architecture (e.g., linear kernel, RBF kernel), model complexity (e.g., regression, GANs), data type (e.g., numerical data, text data, images), learning types (e.g., supervised, unsupervised, reinforcement), learning architecture (e.g., centralized, distributed), to name a few. Furthermore, the attacks listed above are not necessarily stand-alone attacks, they can be built on top of each other. For example, an attacker might launch an initial Model stealing attack to be able to mount a white-box model evasion samples to execute a Model Evasion attack on the original model at a later time.
At ForHumanity, we are most concerned about the impacts or harms of these attacks, which can be categorized as violations against three principles of Information Security:
- Integrity: undermining a model’s inference or prediction process
- Availability: reducing a model’s quality or access to the point where the model component is unavailable to users
- Confidentiality: extracting usable information about a model and/or data
- Privacy: extracting sensitive information about one or more individuals
Under the lens of GDPR, Confidentiality violations are of most importance as the attacker’s goal is to gain access to or gain knowledge about Personal Data and Special Category Data in the training data. These attacks include, but are not limited to: Membership Inference Attacks; Attribute Inference Attacks; and Property Inference Attacks and Model Inversion/Reconstruction. Therefore, special attention should be paid towards understanding the methods and causes of privacy leaks insofar as to devise defenses against them.
Historically, AI-DEPVs were studied in controlled academic environments under the field of Adversarial Machine Learning, but have recently been tested more extensively in the real world by users, researchers, and fully equipped red teams. While academic studies of these attacks focus predominantly on technical aspects, real-world attacks leverage both technical and non-technical vulnerabilities of production ML systems. Our criteria for good defenses against AI-DEPVs, therefore, should draw heavily on both the current literature on Adversarial Machine Learning and cybersecurity lifecycle management frameworks, such as NIST’s Cybersecurity Framework, Gartner’s Adaptive Security Architecture, and Microsoft’s Security Development Cycle.
Criteria for Good Defenses
Criteria for Mature and Sufficient defenses against AI-DEPVs are below. A more detailed discussion of these criteria are documented here: Criteria Discussion , and more details on defenses can be found here.
- Non-Technical Defenses
- Vulnerabilities and risks shall be examined via application-dependent threat modeling for AI Data Entry Point Vulnerabilities.
- Regular reviews shall be established for potential attacks, risks related to AI data entry point vulnerabilities.
- Events and anomalies related to AI Data Entry Point Vulnerabilities shall be collected and analyzed – as part of IAAIS C.1309
- A procedure shall be established to classify incidents, analyze impacts, and investigate incidents of AI Data Entry Point Vulnerabilities.
- A procedure shall be established for containment and communication to stakeholders about incidents of AI Data Entry Point Vulnerabilities.
- If the model is accessible by non-internal parties, model access shall be rate-limited – IAAIS C.1310
- AI Data Entry Point Vulnerabilities shall be tested via penetration testing or red teaming for an algorithm.
- Defenses against adaptive attacks should be tested, and taken approaches and results should be documented.
- Model developers, designers, researchers shall understand the tradeoffs and risks related to Technical Defenses if they are implemented.
- Technical Defenses
- For each attack identified by threat modeling, one or more defenses shall be implemented.
- Differential Privacy should be implemented for any analytics that contains Sensitive Personal Data.
- Bayesian methods should be used for models trained on the Sensitive Personal Data.
- If Differential Privacy is used in analytics or in training the model, the choice of parameters shall be examined.
More Details on Defenses
A document that contains more details on defenses described and referenced below.
Examples of Good and Bad Practice
Mature – More than enough to pass the audit
Sufficient – Enough to pass the audit
Insufficient – Not enough to pass the audit
|Mature||Examples of Mature Non-Technical DefensesThreat ModelingVulnerabilities are identified along all components of the AI systemThreat matrix cover both ML specific and non-ML specific vulnerabilitiesAttacks are concisely summarized, categorized, and described conciselyExtents of mitigation are disclosed, with red team opportunities identifiedExamples: Adversarial ML Threat Matrix: Case Studies – MITRE Corp., Threat Modeling – blog post by wunderwuzziDefense ImplementationRegular intervals (6 months) to check whether implements defenses are up-to dateRegular intervals (6 months) to test existing solutions for adaptive or new attacksImplemented defenses are well documentedHandling of IncidentsA procedure exists to collect/report incidentsOccurring incidents are classified, and considered in the next round of defense checkingProcedures are in place to inform both possible victims and stakeholders about discovered attacks|
|Sufficient||All Technical Defenses are sufficient but not mature, because:Studies on defenses are in the nascent stages but are evolving, thus all defenses can possibly be circumventedInteractions among various defense techniques are not yet widely studiedExamples of Sufficient Technical DefensesBackdooring, Trojaning, Adversarial Reprogramming, Attribute and Property Inference, Sloth AttacksOngoing arms-race/no scientific consensus on working defenseData PoisoningOutlier detection techniquesRegularizationRobust optimizationData Access AttacksEncryption of training and testing dataPrevent Toxic CombinationsModel EvasionAdversarial TrainingA defense tested by a benchmarkRobustbenchModel StealingRate-limit model accessAttacks Related to PrivacyDifferential Privacy (DP) in analytics or in training a model. Alternatively, a classifier with Bayesian inference shall be used as it comes with DP by default.Tradeoff: has an inherent tradeoff between accuracy and privacy (via a hyperparameter such as epsilon)Document hyperparameter choices (e.g. epsilon)RegularizationTradeoff: has risk tradeoff between Membership Inference Attacks and Model Extraction AttacksObfuscating output prediction vector, and returning only strongest classRate-limit model accessOutlier detection on incoming inputsSuppressing count queries with small outputs|
|Insufficient||Examples of Insufficient Technical DefensesNo documentation about either threat modeling or implemented defensesUsing only Regularization as the primary defense for multiple attacksImplemented defenses are older than 24 monthsIf an old defense is used (for example defensive distillation), this defense might be vulnerable to new attacks, creating a false sense of security.Using for example DNN on sensitive data, where classifier is exposed to the publicRisk: Sensitive Attributes or points might be retrieved from the model, in particular if overfitting is not prevented. The model is very likely to leak sensitive data.Unlimited rate of model access if model is trained on sensitive data and/or the model needs to be protected as IPUsing sensitive data and not using Differential Privacy or Bayesian learning.|
Tools to help practitioners implement technical and non-technical defenses against AI data entry point vulnerabilities: Open-Source Tools – Data Entry Point Attacks
Linked Knowledge Stores and Content
|Content Type||Description & Link|
|GDPR Articles||GDPR Art. 32.1.d.: Security of processing|
|IAAS Audit Criteria||Criteria 92: The entity shall examine and consider different types of vulnerabilities and potential AI Data Entry Point Vulnerabilities.Criteria 1155: The entity shall examine and consider different types of vulnerabilities and potential entry points for attacks such as: data poisoning (i.e. manipulation of training data), model evasion (i.e. classifying the data according to the attacker’s will), model inversion (i.e. infer the model parameters)Criteria 1308: The entity shall assess the risk of Model Inversion and Membership Inference Attacks from overfitting and take reasonable steps to mitigate risks.|
|Learning Objectives||GDPR Class:The learner shall understand the definition of model evasion, data poisoning, model inversion and AI Data Entry Point Vulnerabilities. The learner shall examine the considerations and procedures in place to limit, control and/or manage potential entry points.The learner shall be aware of the entity’s reviews and mitigation plans for AI Data Entry Point Vulnerabilities.The learner shall be familiar with the types of AI Data Entry Point Vulnerabilities and the security measures used to prevent the attacks.The learner will be aware of how these risk mitigations are executed and documented, as well as the process for considering and discussing the risks of AI Data Entry Point Vulnerabilities.|
|FH Definitions||AI Data Entry Point Vulnerabilities: vulnerabilities and attacks associated with the data used for training and processing data, where the adversary manipulates the data in order to attack, alter or otherwise corrupt the intended purpose, scope and nature of the algorithmic system (e.g., data poisoning, model inversion, model evasion)Data Input Calibration: process of examining training and processing data with respect to Population Parameters, Population Baseline Norms, Cognitive Bias, accessibility bias, AI Data Entry Point Vulnerabilities|