Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Agile Computing

Agile Computing: Article

i-Technology Viewpoint: How Amazon S3 is Going to Change the World

'Goodbye' scalability problems, 'Hello' unlimited storage

Web 2.0 Journal Contributing Editor Alex Iskold (pictured) writes:  We are observing the transformation of the web from an ecosystem into an operating system. Building blocks such as websites, blogs, web services, podcasts and RSS are coming together and give rise to a new computing platform. The web operating system is emerging and it is bigger than the sum of its parts.
 
Remember your operating systems class, when you learned that every operating system has a handful of fundamental concepts such as storage, virtual memory and scheduling? The new web is no exception. However, since the Internet is a gigantic network of computers working in parallel, the basic operating system concepts take on a different shape.
 
For example, when you try to save a file on your computer, there is a (rare) possibility that the disk is full, and the write will fail. But with the new web operating system, this does not have to be the case. If the disk of one computer becomes full, the web operating system can switch and store the file on another computer. So on the web, you practically never run out of space.
 
Amazon S3 – the new virtual storage service from Amazon
 
The virtual storage has been in the news for sometime now. Dion Hinchcliffe has written a survey of virtual storage providers in his recent post. He particularly commended the Amazon S3 – simple storage service for its innovative API.

In essence, the Amazon S3 offers developers a huge hashtable. The minimalistic API, available in both SOAP and REST, is focused on basic management of the objects – write, read and delete. By default, the service works over HTTP and supports storage of objects up to 5 gigabytes in size. There is also support for BitTorrent and a plan to add other protocols in the future. To use the service, you have to have an Amazon Web Services account.
 
Amazon has done a  very thorough job documenting and supporting the service. The resources page contains a wealth of useful information to get you going, most notably the API and the user forums. There are also code samples available in various languages that illustrate how to use the Amazon S3 API.
 
Storing and retrieving objects
 
The objects in S3 account are placed into buckets. Each account is allowed to have up to 100 buckets and the bucket name has to be unique across all S3 users. 


 


             Figure 1: Example from Amazon S3 API shows CREATE BUCKET request


Each object inside a bucket has to have a unique UTF-8 compliant key assigned by the developer. Since there is no specific key structure imposed by S3, the developers are free to do what best suits their needs. The documentation hints at using slashes to create directory-like structure, but does not insist on it. The lack of key specificity and directory interface in API is not a limitation, but an added flexibility, since people's needs might be different and implementing the directory-like storage is just a matter of following naming conventions in the code.


      Figure 2:  Example from Amazon S3 API shows GET BUCKET request
 
The API also allows the developer to list all the keys in a particular bucket. This is implemented using a flavor of the Iterator pattern and a concept of a marker. With the first query no marker is supplied. If the bucket contains more objects than specified in max-keys parameter, then the a marker for the next starting point is returned. To obtain the next set of results, the marker is passed back in the subsequent request.
 


Figure 3: Amazon S3 diagram

 
Since it might be expensive to fetch the entire object from S3, the API allows you associate the meta data with each object. The meta is returned together with the key when GET BUCKET operation is requested. This functionality is particularly handy for storing and looking up information about large media files.
 
S3 Security
 
S3 has a built in security model for both connecting to the service and setting the access policy for  individual objects and buckets. To access the service, each developer is required to obtain the Access Key ID and the Secret Key. The Access Key ID, the same ID used for all Amazon Web Services, has to be passed in with every request. The Secret Key is used as an encryption key to encrypt  pieces of the request in order to prove the requestor's identity.
 
The authentication is somewhat involved and there are quite a few questions on forums complaining about authentication problems. Amazon API has a page dedicated to it, which you can see here. In addition, there are code samples in various languages which illustrate the correct usage of authentication. If you do not know much about encryption, look for the code sample in your language - it will save you hours of debugging.
 
In addition to the authentication, S3 API comes with an Access Control List (ACL) for every bucket and every object. The ACL can be set via a separate request or at the time when the bucket or an object are created. Here is the set of currently supported ACL choices.




      Figure 4: Current S3 ACL choices

Using AJAX to access S3

S3 opens an intriguing possibility of dramatically simplifying the back end for some applications. You can envision the architecture where an application simply consists of a client, which directly communicates with S3. This client can be a desktop application or an applet or an Ajax application embedded in the browser. Such a model is not appropriate for enterprise systems that require complex data processing, transactions and caching, but it could work well for things like Google Sync or del.icio.us. Simplicity, instant scalability, a built-in security model and Amazon's reputation make a strong case.
 
If you are writing a Firefox extension or Ajax-based web application, you can either write your own S3 wrapper in JavaScript or use S3Ajax developed by Les Orchad. S3Ajax basically mimics the S3 API, and takes away the pain of dealing with the formatting of the raw requests and encryption.
 
It appears that the developer has plans to build this out further. It would be good to have a higher level of abstraction built in, as well as a way to loop through the objects in the bucket and ability to fetch multiple objects concurrently.
 
This all sounds very sweet, but what about the performance?
 
Needless to say, the performance is one of the top questions on the Amazon S3 forum. Amazon does not provide any specific data and does not have an SLA in place. But the design requirements and the design principles from the S3 creators show strong focus on performance and scalability.
 
There are a  few benchmarks that independent developers posted on forums. I've seem these benchmarks improve since S3 launched. The latest results, indicate low latency. A benchmark on March 17 said: “Putting a 2 MB mp3 took 0.8s, retrieving it took 1.085s -- quite fast, quite responsive”.
 
So who is using Amazon S3 right now?
 
Even though the service launched just a few months ago there is already a number of companies leveraging it for a wide range of purposes.  In personal on-line storage space we note ElephantDrive (Windows), JungleDisk (Windows,Linux,Mac) and Filicio.us (Web-based).  From the discussion forums it is clear that a few companies are planning to use S3 to store media files, but we cannot tell who these companies are. At adaptiveblue we are using S3 to store the bluemarks of our users favorite books, movies and music. And if you want to get a feel for what other things are possible, take a look at this top10 list.
 
Conclusion
 
Amazon S3 is an innovative, exciting new service that is going to change the way we do computing on the web. Michael Arrington of TechCrunch said this in one of his posts:
 
“S3 provides a terrific opportunity for startups with great ideas for a storage user interface to avoid building a back end storage infrastructure. Amazon is offering extremely low pricing and a very dependable infrastructure. For some people, S3 will allow them to launch a service that they otherwise couldn’t have built.”
 
If you have not done so already, take a look at S3 and see for yourself. A good place to start playing with it is the jSh3ll, which offers shell like command line interface to the service. Then join the Amazon Web Services program and start coding. The possibilities are endless!
 
 

More Stories By Alex Iskold

Alex Iskold is the Founder and CEO of adaptiveblue (http://www.adaptiveblue.com), where he is developing browser personalization technology. His previous startup, Information Laboratory, created innovative software analysis and visualization tool called Small Worlds. After Information Laboratory was acquired by IBM, Alex worked as the architect of IBM Rational Software Analysis tools. Before starting adaptiveblue, Alex was the Chief Architect at DataSynapse, where he developed GridServer and FabricServer virtualization platforms. He holds M.S. in Computer Science from New York University, where he taught an award-winning software engineering class for undergraduate students. He can be reached at [email protected]

Comments (5) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Jeff Miller 06/16/06 07:43:35 AM EDT

Terrific eye-opener. And it led me to a Hypertext adventure of further learning about many of the Web 2.0 and 3.0 technologies and vision. Well done. Thank you!

Martin Kochanski 06/14/06 01:13:19 PM EDT

We've integrated S3 support into our Cardbox end-user database (http://www.cardbox.com). We now have an automated backup feature that copies all databases to a backup store on S3, automatically, at user-specified intervals. Because Cardbox is doing the backup itself, it can even back up databases that are currently open and being worked on. More at http://cardbox.wordpress.com/2006/06/13/amazon-s3-and-cardbox/.

The Amazon API is beautifully designed and beautifully documented: a pleasure to work with. But the best thing of all is that it's all automatic: "backup without doing backups"!

Joe Labbe 06/13/06 09:21:20 PM EDT

We've integrated S3 as the document storage for our Ratchet-X product and the results have been superb. The service is blazing fast and reliable. We've toyed in the past with the idea of adding document storage but hesitated because of the operational requirements. Amazon's S3 has removed this impediment thus allowing us to build a business grade storage service for our product with minimal effort. I highly recommend it.

ambit.io.us 06/11/06 04:08:50 AM EDT

Great article. More AJAX in action!

ambit.io.us 06/11/06 04:08:23 AM EDT

Great article. More AJAX in action!

@ThingsExpo Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.