How to Easily Protect Generative AI Models from Malware and Viruses

As enterprises build out Generative Artificial Intelligence (GenAI) capabilities, one of the biggest security challenges we hear from customers is understanding data integrity used by large language models (LLM) for GenAI applications. Customers tell us they face this challenge both with private data used to fine tune foundational models (FM) as well as with output like text or chats they pass onto their end users or applications.

We call this the “garbage in / garbage out” problem where if the data going into GenAI models contains malicious code or the AI output passes along sensitive data unintentionally, businesses can unwittingly put themselves at risk simply by adopting new technology. If enterprises are using GenAI, so are malicious actors and businesses need to take steps to prevent abuse. This concern is especially relevant in the ‘wild west’ of malware and AI— a frontier that is defined by great opportunity, and increasing vulnerabilities.

To protect applications integrated with GenAI, organizations should scan data for malicious code as well as sensitive information beginning with training GenAI models and ending with delivered output in text or chat served downstream for use.

GenAI Secure by Cloud Storage Security (CSS) allows customers to protect both model data and output from GenAI applications, guarding against risks such as data poisoning and data loss, without disrupting workflows.

GenAI Secure can be used with AWS’ suite of artificial intelligence-focused tools such as Amazon Bedrock, Amazon SageMaker, and AWS Trainium. As long as an AI service uses CSS-supported storage services, you can scan for both malicious code and sensitive data using GenAI Secure. CSS automates data security with support for Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Storage (Amazon EBS), Amazon Elastic File Service (Amazon EFS) and Amazon File Server (Amazon FSx).

This article reviews how enterprises can use GenAI Secure to scan data in Amazon S3 and create a quarantine pipeline that doesn’t disrupt data cleared for training GenAI models or output to end users when using Amazon Bedrock.

GenAI Secure for AWS

GenAI Secure is built using modern serverless architecture and installs within your AWS account so data never leaves your environment. Deployment via AWS CloudFormation template or HashiCorp Terraform module make it easy to get up and running so that you can secure AI data rapidly.

Once deployed, Amazon S3 buckets are automatically discovered and cataloged for scanning, whether in one or multiple regions or accounts, giving you confidence that AI model data in S3 is secure.

To scan data for sensitive information, you can enable default classification rule sets that include Payment Card Industry (PCI) standards, Health Insurance Portability and Accountability Act (HIPAA), and other predefined policies for common personally identifiable information (PII) items across a variety of regions. Perhaps more importantly, GenAI Secure allows you to create your own custom regular expression (RegEx) policies to detect sensitive data in user inputs and FM responses and prevent specific prompts or outputs from being exposed. Crafting RegEx policies can be challenging because the syntax can be complex and dense. Leveraging Bedrock, GenAI Secure allows you to enter simple text to identify patterns or text you want to identify and will return the exact value you create for that rule.

You can scan over 300 different file types as large as 5 TB (the maximum file size permitted by S3) using the following scan models:

Event-based - scan data in real time when pushed to custom data models.
Retro - scan existing data on demand or via schedules.
API - scan content and receive a verdict before delivering chat/text output to end users or another application.

You can enable multiple virus detection engines for increased detection efficacy. When a verdict is returned and an object is found to be infected or contain sensitive information, you may quarantine it, delete it or decide to keep it in place.

Additionally, we’ve harnessed the power of the SophosLabs to offer Static Analysis and Dynamic Analysis. Further, GenAI Secure leverages Bedrock to perform forensic analysis on files and provide recommendations for remediation. Many times, malicious files are cryptic or require additional analysis to understand what the malware does. Oftentimes, security analysts use additional tools like Google search or VirusTotal to get context about malicious files, which can slow down an investigation or increase risk for manual error. GenAI Secure allows for the investigation to take place within your AWS environment so you don’t have to worry about data transferring via the public internet or being sent to a third party application.

GenAI Secure’s user-friendly console displays findings, but also integrates with services such as AWS Security Hub or your own Security Information and Event Management (SIEM) for consolidated reporting. Refer to Proactive Notifications in the CSS Help Docs for more information.

It only takes a few clicks of a mouse to implement the scan model that best fits into your GenAI pipeline. Below we explore how to protect GenAI models and secure AI data using CSS’s event-based scanning model.

Protect GenAI Model Data Inputs

When it comes to securing model data, malware scanning is best applied before the AI data is used to create or enhance models. Additionally, if you wish to restrict the use of sensitive data within your models, that information should be removed before data is used. For example, to scan external LLM training data when it’s received, before uploading a custom AI model to Amazon Bedrock (e.g., data input pipeline), you could use the event-based model with a staging Amazon S3 bucket. The input pipeline example in Figure 2 below comprises the following steps:

Customers/applications upload objects to a staging S3 bucket
As soon as an object is uploaded, Amazon SNS sends an event notification to Amazon SQS
Amazon SQS queues the data for scanning
Amazon SNS tells the agents to start scanning
The agents kick off the scan task
Results are posted to DynamoDB, which sends results to the GenAI Secure console
Once scans are complete, the agents move malicious objects or objects that contain sensitive data to a quarantine bucket
Safe objects are marked with appropriate tags
Lambda moves safe objects to the S3 bucket used for custom models
The data can then be used by Amazon Bedrock for training

CSS - GenAI Reference Architecture Diagrams_AWS article (2)

Figure 1: GenAI Secure Input Pipeline

Protect GenAI Application Outputs

If sensitive data is used to train your model, but you need to prevent sharing confidential information via the chat / text output produced by your GenAI application (e.g., data output pipeline), it’s best to classify the data before it is delivered to the end user.

Scanning AI/ML outputs via the event-based model uses a similar process as the data input pipeline above, however, you would designate the Bedrock output bucket as a staging bucket and use Amazon S3 as the custom model output bucket. The output pipeline example in Figure 3 below comprises the following steps:

Customers/applications interact with Bedrock to produce chat/text output
That output goes into a staging bucket
As soon as an object is uploaded, Amazon SNS sends an event notification to Amazon SQS
Amazon SQS queues the data for scanning
Amazon SNS tells the agents to start scanning
The agents kick off the scan task
Results are posted to DynamoDB, which sends results to the GenAI Secure console
Once scans are complete, the agents move malicious objects or objects that contain sensitive data to a quarantine bucket
Safe objects are marked with appropriate tags
Lambda moves safe objects to the S3 bucket used for custom models
The resulting chat/text is free of sensitive data and can be used

CSS - GenAI Reference Architecture Diagrams_AWS article (3)

Figure 2: GenAI Secure Output Pipeline

Built for and Powered by AWS to Secure AI Data

GenAI Secure is built for and powered by AWS using AWS Services in Scope of AWS assurance programs, which comprise control mappings to help customers establish the compliance of their environments running on AWS. When software sits on authorized infrastructure, its controls are inherited from that authorized system. CSS is built on and integrates with a variety of AWS services for administration and scanning. The AWS services listed below are required to deploy and run GenAI Secure.

GenAI Secure required AWS Services

Figure 2: AWS Services Required to Deploy and Run GenAI Secure

Additional AWS services may be used depending on your deployment needs and how you would like to process findings. For example, if you would like to deploy GenAI Secure in multi-region architecture using private subnets, you would need to designate an Amazon Virtual Private Cloud (Amazon VPC). Optionally, you can send scan results to services like AWS Security Hub, AWS CloudTrail Lake, and Amazon EventBridge for forensic use cases or to alert teams / applications for other workflows.

Conclusion

In this article, we discussed how to use GenAI Secure with Amazon Bedrock to protect GenAI models and secure AI outputs. We covered integrating the serverless solution with your workflow, discussed the AWS services used by GenAI Secure, and showed you how to build a quarantine workflow for malicious files and sensitive data.

GenAI Secure can be used with any AI service that uses CSS-supported AWS storage and file services to house and serve model data. When you integrate GenAI Secure by Cloud Storage Security (CSS) into your GenAI application workflow, you can ensure the data is safe for processing, transformation, and use.

About Cloud Storage Security

Cloud Storage Security (CSS) is an AWS Public Sector Partner and AWS Marketplace Seller with an AWS qualified software offering, AWS Security Competency, and an AWS Global Security & Compliance Acceleration (ATO on AWS) designation. CSS helps customers around the world automate and secure data transfers and data stored in multiple AWS storage services including Amazon S3, Amazon EFS, Amazon EBS, and Amazon FSx.

Malware Protection

Managed File Transfer

Malware Protection

SOC 2

Healthcare & Life Sciences

Public Sector

Media & Entertainment

Financial Services

Latest News

MYOB: Unified File Security for Document Exchange and Client Information

News and Insights

Webinars

Whitepapers & eBooks

Case Studies

Support

Casmer Labs

Events

Latest News & Insights

Elevating Enterprise Security: ANDPAD’s Strategic Transformation

MYOB: Unified File Security for Document Exchange and Client Information

AWS

MSP & Tech Partners

Technology partners

You might find helpful

About Us

Careers

Contact Us

Featured News

Cloud Storage Security Launches Antimalware for Azure Blob

BLOG

How to Protect Generative AI Models

Share:

GenAI Secure for AWS

Protect GenAI Model Data Inputs

Figure 1: GenAI Secure Input Pipeline

Protect GenAI Application Outputs

Figure 2: GenAI Secure Output Pipeline

Built for and Powered by AWS to Secure AI Data

Figure 2: AWS Services Required to Deploy and Run GenAI Secure

Conclusion

About Cloud Storage Security

Share:

Continue Reading

Automate Cloud Storage Security deployment using Account Factory Customization

Read now

CSS Supports AWS Control Tower to Enhance Data Security

Read now

Managing Malware in Multi-Cloud

Read now

Tired of Reading?