5 min read

How to Eliminate Malware from S3 Data Lakes and Application Workflows

Amazon Simple Storage Service (Amazon S3) launched over 16 years ago and today houses over 200 trillion objects making it one of the most successful services provided by AWS. Organizations use Amazon S3 to build data lakes, run cloud-native applications, backup and restore critical data, and archive data at low cost. Amazon S3's popularity has made it the center pin of cloud computing. This popularity has opened the door to advanced threats such as malware, ransomware, viruses, worms, trojans, and more.

As dictated by the AWS Shared Responsibility Model, AWS is responsible for security “of” the cloud and the customer is responsible for security “in” the cloud. In the case of Amazon S3, this means that AWS operates the infrastructure layer, the operating system, and platform; S3 customers are responsible for managing the security of their data.

Typically, when we talk to organizations about data protection in Amazon S3, they think about ensuring appropriate access controls and encryption configurations; the security of the data itself is overlooked, including the potential for objects to be carrying malware.  

Although getting network/endpoint security and permission policies right is critical, they do not necessarily keep compromised objects out of application workflows or data lakes. This is especially true if the bucket is used as an input or the output of a process; an object infected with malware can be intentionally uploaded by an attacker or unintentionally uploaded by a legitimate user. 

While infected objects cannot infect other files in Amazon S3, they can do damage once they are opened. In order to hinder the transmission of a payload of malicious programs and prevent infection, organizations have an obligation to scan the items that are written to and shared from their buckets, especially if unknown sources are allowed to write objects or the organization is ingesting data from third parties.

Not only does scanning for malware build defense in depth, ensure data cleanliness, and prevent infection, it's also a requirement of many laws and compliance frameworks such as SOC 2, PCI DSS, NIST, ISO 27001, etc.

While AWS offers a range of native services to help organizations secure their Amazon S3 buckets, it does not scan the objects going into or out of S3 for malware. As part of shared responsibility, organizations must implement their own solution to address such threats.

To solve for these challenges, Cloud Storage Security, an AWS Partner with an AWS Security Competency designation, an AWS Qualified Software offering and a Public Sector Partner designation, designed a solution for organizations that need to scan objects and files for malware but that don’t want to build their own tool or invest in an expensive and complicated data security platform to do so.

In this post, we introduce Antivirus for Amazon S3 - a scalable, in-tenant cloud-based solution focused on scanning objects for malware - as a way to eliminate threats from data lakes and application workflows built on Amazon S3.

By using Antivirus for Amazon S3, you can protect against malicious content. Access Antivirus for Amazon S3 in AWS Marketplace.


Antivirus for Amazon S3

Antivirus for Amazon S3 is built using modern serverless architecture and is installed within your AWS account so data never leaves your environment. Deployment via an AWS CloudFormation Template makes it easy to get up and running.

The Antivirus for Amazon S3 infrastructure is built around AWS Fargate containers in order to be serverless like AWS Lambda and faster and more flexible than Amazon EC2. Two Lambdas are leveraged for subdomain management, but not for the workload. Antivirus for Amazon S3 seamlessly integrates into any workflow with out disruption.

All S3 buckets are auto discovered and new as well as existing objects can be scanned whether in one or multiple regions. You determine which objects to scan.

Sophos Antivirus Dynamic Interface and ClamAV® virus detection engines are available for use and organizations can scan objects as large as 5 TB (the maximum file size permitted by S3) in a variety of ways so as to not interfere with DevOps workflows. Scanning models include:

  • Retro – scan existing objects on demand or via schedule
  • Event – scan new objects in near real time when they are dropped into S3
  • API – scan objects before they are written to S3 from within or outside of AWS via API (this is especially useful if you initiate a workflow where the scan dictates whether the object should be stored in Amazon S3)
  • Amazon S3 Proxy – scan objects on intake before they are written or on access when they are retrieved by leveraging the Amazon S3 APIs you are already using (PUT, POST, GET)

Antivirus for Amazon S3 Scan Models

Encrypted objects may be scanned if Antivirus for Amazon S3’s Agent Role is granted access to the keys.

Best practices mandate scanning all files in buckets with any public aspects to them. Generally speaking, the contents of buckets that are shared with others (public or not) should be scanned in order to ensure what is shared is clean. Buckets that may be written to by trusted programs like AWS CloudTrail may not need to have their objects scanned for security purposes but you may need to scan them in order to meet regulatory requirements.

Once a scan has been completed, results could depict the following problems:

  • Infected - objects infected with malware
  • Unscannable - objects that cannot be scanned because they are password protected, exceed the size limit, or are KMS encrypted and key access has not been granted
  • Error - errors where access to a linked account has been broken or the file no longer exists

When a verdict is returned and an object is found to be infected, you may quarantine it, delete it or decide to keep it in place. The Move command within the console directs the scanner to take the object and place it (copy then delete) in a quarantine bucket. By default, a quarantine bucket is created for each region in which a bucket for scanning has been enabled.

After infected objects are found, Antivirus with Amazon S3 assists with forensic analysis as files are segmented by bucket and account enabling you to trace where the file entered and into which account it was added.

If it’s questionable whether a file is truly a problem, we’ve harnessed the power of the SophosLabs Intelix Platform to offer Static Analysis and Dynamic Analysis.

Antivirus for Amazon S3 has a user-friendly console, but also integrates with AWS Security Hub for consolidated reporting (more on that later). Amazon Simple Notification Service (SNS) can be used for alerts. Amazon CloudWatch is leveraged for logging. 


Getting Started

To use Antivirus for Amazon S3, you’ll need to:

From there, you can following the How to Deploy section of the Cloud Storage Security Help Docs to set up and run the  default deployment of Antivirus for Amazon S3 in as little as 15 minutes. 



Malware scanning is a necessary, additional layer of security that bolsters protections and should be integrated in a defense in depth strategy.

Antivirus for Amazon S3 by Cloud Storage Security provides organizations assurance that all objects written to S3 are free from malware while also providing evidence of scanning to meet regulatory requirements.

Moreover, the solution optimizes cost with its Smart Scan option, which allows you to trigger scans when a certain number of objects accumulate (as opposed to scanning whenever an object is placed in the scanning queue) and its scheduling option, which allows you to define when the agents run a scan. 

If your organization utilizes a workflow in which employees, customers and partners write and retrieve objects from S3, it is your responsibility to ensure those objects are free of malware. Begin your free trial of Antivirus for Amazon S3 today and scan 500 GB in 30 days.



Tired of Reading?

Want to watch something instead?

Website_Case_Studies_Watch_Video (3)