S3 (Simple Storage Service)

Brian Washington

Aug 25, 20207 min read

Simple Storage Service - S3

S3 provides developers and IT teams with secure, durable, highly-scalable object storage. Amazon S3 is easy to use, with a simple web services interface to store and retrieve any amount of data from the web.

What is S3?

S3 is a safe place to store your files. Flat files, word documents, pictures, videos.
Object-based storage.
The data is spread across multiple devices and facilities.

Basics of S3

Object-based - allows you to upload files.
Files can be from 0 bytes to 5 TB.
There is unlimited storage.
Files are stored in buckets. Buckets are basically folders.
S3 is a universal namespace. Names must be unique globally.
https://s3-eu-west-1.amazonaws.com/acloudguru
When you upload a file to S3 you will receive a HTTP 200 code if the upload was successful.
Not suitable to install an operating system on. For OS storage, use block-based storage.
Turn on MFA Delete.

S3 is object based. Objects are like files.

Objects consist of the following:

Key (name of the object)
Value (the data and is made up of a sequence of bytes)
Version ID (important for versioning)
Metadata (data about data you are storing)
Subresources;
- Access Control Lists - permissions on each object
- Torrent

Data Consistency in S3

Read after Write consistency for PUTS of new Objects
Eventual Consistency for overwrite PUTS and DELETES (can take some time to propagate)
If you write a new file and read it immediately afterwards, you will be able to view that data.
If you update AN EXISTING file or delete a file and read it immediately, you may get the older version or you may not. Changes to objects can take some time to propagate.

S3 has the following guarantees from Amazon:

Built for 99.99% availability for the S3 platform.
Amazon Guarantee 99.9% availability.
Amazon guarantees 99.99999999999% durability for S3 information. (remember 11 x 9s)

S3 features:

Tiered Storage Available
Lifecycle Management
Versioning
Encryption
Multi-Factor Authentication for deleting objects
Secure your data using Access Control Lists and Bucket Policies

S3 Storage Tiers

S3 Standard
- 99.99% availability
- 99.9999999999% durability, stored redundantly across multiple devices in multiple facilities, and is designed to sustain the loss of 2 facilities concurrently.

S3 - IA
- (Infrequently Accessed)
- For data that is accessed less frequently, but requires rapid access when needed.
- Lower fee than S3 but you are charged a retrieval fee.

S3 One Zone -IA
- For where you want a lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience.

S3 - Intelligent Tiering
- Designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead.

S3 Glacier (Data Archiving)
- S3 Glacier is a secure, durable, and low-cost storage class for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. Retrieval times configurable from minutes to hours.

S3 Glacier Deep Archive (Data Archiving)
- S3 Glacier Deep Archive is Amazon S3's lowest-cost storage class where a retrieval time of 12 hours is acceptable.

You are charged for S3 in the following ways:

Storage
Requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration
Cross Region Replication Pricing

Cross Region Replication - objects are automatically replicated in buckets across regions.

Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your end users and an S3 bucket.

Transfer Acceleration takes advantage of Amazon CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

Control access to buckets using either a Bucket ACL or Bucket Policies

By default, all newly created buckets are PRIVATE. You can setup access control to your buckets using

Bucket Policies
Access Control Lists

Encryption in Transit is achieved by

SSL/TLS

Encryption At Rest (Server Side) is achieved by

S3 Managed Keys - SSE-S3 (AWS manages keys for)
AWS Key Management Service, Managed Keys -SSE-KMS (AWS and customer manage keys)
Server Side Encryption with Customer Provided Keys -SSE-C (customer manages keys)

Client Side Encryption

Versioning

Stores all versions of an object (even with delete marker, previous versions still exist)
Great backup tool
Once enabled, cannot be disabled only suspended.
Integrated with Lifecycle rules
Versioning's MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security

Lifecycle Management

Automates moving your objects between the different storage tiers.
Can be used in conjunction with versioning.
Can be applied to current versions and previous versions.

Cross Region Replication

Versionning must be enabled on source and destination buckets

Regions must be unique

Files in an existing bucket are not replicated automaticlly

All subsequent updated files will be replicated automatically

Delete markers are not replicated

Deleting individual versions or delete markers is not replicated.

CloudFront

Content Delivery Network is a system of distributed servers (network) that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the webpage, and a content delivery server.

Edge Location - This is the location where content will be cached. This is separate to an AWS Region/AZ.

Origin - This is the origin of all the files that the CDN will distribute. This can be an S3 Bucket, an EC2 instance, an Elastic Load Balancer, or Route53.

Distribution - This is the name given the CDN which consists of a collection of Edge Locations.

Amazon CloudFront can be used to deliver your entire website, including dynamic, static, streaming, and interactive content using a global network of edge locations. Requests for your content are automatically routed to the nearest edge location, so content is delivered with the best possible performance.

Two types of distributions

Web distribution - typically used for websites

RMTP – media streaming

Edge locations are not just READ only - you can write to them too.

Objects are cached for the life of the TTL (Time to Live)

You can clear cached objects, but you will be charged.

Snowball

Pedabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.

Comes in either a 50TB or 80TB size.

Snowball Edge is a 100TB data transfer device with on-board storage and compute capabilities. Used for moving large amounts of data into and out of AWS, as a temporary storage tier for large local datasets. Like a mini-AWS because of compute capabilities.

Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45 foot truck.

Snowball can import to S3 and export from s3.

Storage Gateway

Connects an on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

AWS Storage Gateway's software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacenter. Storage Gateway supports either VMware ESXi or Microsoft Hyper-V. Once you've installed your gateway and associated it with your AWS account through the activation process, you can use the AWS Management Console to create the storage gateway option that is right for you.

3 Different types of Storage

File Gateway (NSF & SMB) - for flat files, stored directly on S3 Files are stored as objects in S3 buckets, accessed through a Network File System (NFS) mount point. Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object associated with the file. Once objects are transferred to S3, they can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication apply directly to objects stored in your bucket.

Volume Gatweay (iSCSI) The volume interface presents your applications with disk volumes using the iSCSI block protocol. Data written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots. Snapshots are incremental backups that capture only changed blocks. All snapshot storage is also compressed to minimize your storage charges.

Stored volumes - entire dataset is stored on site and is asynchronously backed up to S3
- Let you store your primary data locally, while asynchronously backing up that data to AWS. Stored volumes provide your on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups. You can create storage volumes and mount them as iSCSI devices from your on-premises application servers. Data written to your stored volumes is stored on your on-premises storage hardware. This data is asynchronously backed up to Amazon Simple Storage Service (Amazon S3) in the form of Amazon Elastic Block Store (Amazon EBS) snapshots. 1 GB – 16 TB in size for Stored Volumes.

Cached volumes – entire dataset is stored on s3 and the most infrequently accessed data is cached on site
- Cached volumes let you use Amazon Simple Storage Service (Amazon S3) as your primary data storage while retaining frequently accessed data locally in your storage gateway. Cached volumes minimize the need to scale your on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. You can create storage volumes up to 32 TB in size and attach to them as iSCSI devices from your on-premises application servers. Your gateway stores data that you write to these volumes in Amazon S3 and retains recently read data in your on-premises storage gateway's cache and upload buffer storage. 1 GB – 32 TB in size for Cached Volumes.

Tape Gateway (VTL) Offers a durable, cost-effective solution to archive your data in the AWS Cloud. The VTL interface it provides lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway. Each tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices. You add tape cartridges as you need to archive your data. Supported by NetBackup, Backup Exec, Veeam etc.

S3 (Simple Storage Service)

Simple Storage Service - S3

Recent Posts

Comments