Tuesday, November 16, 2010

Cloud Computing Part – II

Microsoft Azure vs. Google Apps vs. Amazon

Microsoft Azure:

The Windows Azure Service Platform from Microsoft is a unique cloud offering for the Microsoft Developer Community. If you are Microsoft based shop, then Azure is the only option as of now for developing applications using .net. Microsoft envisioned the entire cloud stack keeping in mind the entire developer ecosystem which is based on .net framework. Developers can use any language based on .net could be C#, VB .Net , Iron python, Iron Ruby and also Java currently added to the list or in future any language developed targeted on .Net . It also targets PHP and Java developer too.

Azure platform is an internet-scale cloud service platform which has the following core components: Compute, Storage and Fabric. Compute provides computation environment, storage provides scalable storage in form of Blobs, Tables, Ques, Drives where as Fabric takes care of managing the underlines host of complex network cluster and virtual instances. This helps application developer to stay focused in understanding his business problem rather the underline software and hardware architecture.

Its open architecture provides developer the choice of building various kinds of applications starting from web based applications to application that can run on other connected devices. It provides database of choice. Developers can use still the very old structure relation database like SQL server or they have choice for non SQL database which can scale for large projects like social applications.

Google App Engine:

Google app engine is a platform for developing applications using Google in house managed data centers. As Google is already uses lot of applications for its enterprise, the App Engine is primarily built on the same style.

App Engine has a more limited platform. Allows developers to build applications using python or Java and provides the storage as Google proprietary BIG Table data base. As of this writing there are many standard Java features are not yet supported.

Though it has limited offering, Hosting applications in App Engine is fairly simple and robust compared other cloud offering in the market. Google has worked very hard to make sure the developer need not to take the pain to understand the underlined architecture.

If you are a Java or python shop, the ultimate platform is App Engine. Google also recently included its popular GWT on its App Engine. For Java based applications this is an added advantage they can use built in stack for web deployment.

However Google does not support built in relational database as it only provides Big Table assess through high level APIs. There are many companies providing bridge between the relational and Non relational mapping. So this only should not be big problem while considering App Engine.

Amazon web services (AWS):

Amazon is the one of the first vendor to bring the cloud infrastructure to developer community in a big way. It has an array of technologies offered on the premise of cloud computing. It has different levels of offering for different business size.

With AWS users have the flexibility to choose whichever development platform or programming model makes the most sense for a particular business problem. AWS always maintained platform agnostic from the beginning which removes the vendor lock in. In one hand the AWS architecture provides platform neutral open architecture where the developer has complete freedom to configure his own stack on the other hand the developers have to take care the administrative tasks like managing the instances, no of instances required such as compute instances, storage instances. This needs more administrative people needed. Again all this boils down to the business requirement.

Amazon provides the following key cloud platforms:

Amazon Elastic Compute Cloud ( Amazon EC2 ): A web service that provides resizable raw computing over the cloud. The developers can define the entire stack starting from OS, services, databases and application stack. Amazon provides complete freedom to the developers to mix and match their requirement.

Amazon Relational Database Service ( Amazon RDS ): This offering is targeted the traditional relational data base community who needs to store the data in relation database. Underlined database used is MySql. These MySql databases can scale as requirements grows dynamically. All the database management related issues are taken care under the hood without really exposing to the user.

Amazon SimpleDB : Amazon SimpleDB is an answer to NO SQL, non relational kind of database users. SimpleDB is a highly scalable non relational data store is targeted for the companies which need to store and retrieve large scale data over the cloud.

Amazon Simple Storage Service (Amazon S3) – A simple web services interface that can be used to store and retrieve large amounts of data, at any time, from anywhere on the web. It gives developers and businesses access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.

Friday, November 12, 2010

Cloud computing Part – I

SaaS, PaaS, IaaS

It’s been quite long time, we are hearing about the buzz surrounded by cloud computing and its true potential. No doubt, it is clearly a game changer from business perspective. It allows the economy of scale. There are various vendors embraced cloud computing in different architectural styles and based on different business models. The cloud providers mainly categorized under the following types based on the segment or stack they are operating. Cloud computing is broken down to three segments like software, platform, infra structure.

SaaS ( Software as a service ): This is the topmost layer in a cloud stack. In this layer typically the software is provided as service to the end user. We can think of any on premise software available to the end user over the web and the end user instead purchasing the software he is paying cost per month based on the usage of the software. In this case the traditional way of owning and purchasing of software will fade off. This is what truly called as “utility computing” and “pay as you use” model. This disruptive model helps in reducing the initial capital investment for hardware ( capex ) and operational (opex ) cost. At the end of the day the user just pays for what he uses like we pay bills for electricity. The best example of SaaS vendor is Salesforce.com which was pioneered this business model and they are truly leader in SaaS offering. Currently there are mushrooms of vendors offering SaaS applications for various needs. The cloud of companies which provides SaaS offering for end user are Salesforce , ZOHO, Net Suite, BOX.Net , DropBox, Taleo etc.

PaaS ( Platform as a service ): PaaS offers a development platform for developers. The PaaS vendors provide necessary tools for developing applications. There are vendors provide various technology stack built on to their platform. So once you choose PaaS vendor then the end user has locked down with the same platform. The best example of PaaS is Salesforce.com’s Force platform. Which allows to build applications and host on the force platforms. These vendors offering are unique. They provide the high level tools for users, who can build applications with much knowledge of programming. This makes small companies to adopt IT without really having army of developers to build their applications. OrangeScape one of Indian PaaS provider provides cross platform solutions on all the cloud offerings such as Microsoft Azure, Google App Engine, Amazon WS.

The disadvantage of PaaS is the vendor lock in. The application is completely built on the PaaS vendor software stack however the scenario is going to change. In future vendor will enhance their offering to cater different platform stacks.

There PaaS solutions can be categorized in to the following platforms based on their solutions :

1. Social application platforms

2. Web application platforms

3. Business application platform

Social applications : Face Book emerged as platform for social applications. Face book provides platform and tool on which developers can build and extend the social applications.

Web Applications : There are lot of vendors provide the basic web platform on which developer can build their web applications. Google provides lot of tools and apis for building applications.

Business Applications : The best example is Force.com from SalesForce. This provides one of the best integrated platform for business application developers.

Iaas ( Infrastructure as a Service ): The third category in the cloud computing is called Infrastructure as service.

Iaas delivers the raw computing for developers to build their application with all the freedom that used to be there while building traditional applications. They can choose the entire software and hardware stack which they wish to deploy for the application. All the big vendors like IBM, Amazon , Microsoft, Google provides IaaS solutions. The user can buy the infrastructure based on the demand on “Pay and Use” mode. In this class of computing the user has the entire control on the various instances which are running in the runtime so that the users know how to effectively control various processes. The issues in this kind of platform model is, the managing and monitoring these processes will be challenging and vendors do need administrative staff for managing and monitoring these complex processes.

The kind of cloud computing to choose from based on the customer business model and business requirement. The more granularity of cloud we need the deeper we need to go. Definitley IaaS is not for end users.

There are certain class of business problems well be suited for IaaS kind of offering. Before cloud computing hit the main stream adoption there was lot of entry barrier for startup vendors compared to the established players though they have better technology in hand. For example, if a customer wants to start a social application platform, he has to worry about upfront hardware cost and all the associated data center cost. Typically startups will not have that kind of investment. But due to cloud computing the whole game has been changed. These customers easily setup their applications on the cloud from day one and thanks to virtualization technology, the infrastructure can scale on demand. The customer is just paying what are the services he is going to use for not an upfront huge cost.

Wednesday, September 1, 2010

RIA made easy : a simple implementation of RIA using Java

There are many popular established web frameworks around us for several years and there are many more taking shape. A common list of popular web frameworks are

From Ruby world : Ruby on Rails

Python : DJango, Turbo gears

Java : enormous … Wicket , Pivot , Tapestry , Stripes , Scooter, JSF , Struts, Java FX

C# : .net and silver light

Scala : Lift

Google : GWT

I personally love to work on Rails and DJango as they are open source, flexible, abstract the internal complexity from the developer and guarded by strong communities and based on Dynamic languages. Though these frameworks simple but becomes more complex when we need to scale up and we need to learn a new jargons and way to code. Of course we need to learn the base languages on which these frame works have been built. They are extremely powerful web frame works but we need to master the craft. If I am a python or ruby guy then I would not think twice before I could use them.

I was looking very recently for one of my project to work on a simple java frame work where I need not to learn many new jargons and concepts and gone thru few of them and not much help as each one has their own yards.

I found a new Vaadin framework which is very simple to learn and gone thru their documentation and its pretty amazing and started working and it works.

The following are the features which fascinates me about Vaadin :

Enables creating Rich Internet Applications fast.

It is built on Java which most of the developers familiar with and licensed under apache.

The frame work is more like swing so it is very nearer to hearts of desktop guys.

Does not need a plug in like Java FX or flex. So it is browser agnostic.

Built on GWT based widgets. This helps in extending to write custom user controls and support from Google. Also gets App engine support is big plus.

Eclipse plugin support is available. Now a days this is very basic however.

There are many pre built UI controls ready to use with most commonly used features. Decent implementation.

The default skinning looks good. Provides user modified theaming using CSS.

Integrating Vaadin with other popular frame works from java world like Spring and hibernate.

The frame work is well documented comes with on line demo and the source code and a Book.

Backed by company called IT mills and comes with a long heritage. The project started on year 2000.

If you have time do stop by www.vaadin.com.

Tuesday, February 2, 2010

NoSQL database gaining momentum...

The next generation Non Relational, NonSQL/NoSQL databases are gaining quite a lot of popularity in the world of web scale data stores. There is an interesting shift happening from the traditional row and column based relational databases to key value pair based non sql databases. They are often called as document centric databases.

Some of the limitations of traditional relational databases are as follows:

1. The data has to be normalized and should be available in row and column format. This kind of data makes best candidate for storing in relational data store. That means relational databases are not really good candidate for string non structured data where data is not available in row and column format.

2. Normalization will reduce the performance.

3. Replication between nodes is very painful and more expensive.

4. Relational database are really very hard to scale horizontally.

In other hand the Key and Value based data stores provide the same features and flexibility and good for large amount of data to be stored and processed. They also work well with non structured or semi structured meta kind data sets.

The following are the key features of this new data stores :

1. Schema free , de-normalized document storage

2. Key/value based lookups

3. Good candidates for Horizontal scaling. ( Scales well with very large no of nodes )

4. Support for map and reduce style programming

5. Built in replication

6. Simple HTTP/REST based APIs

7. Most suitable for cloud based applications

Some of the most popular document based data stores are

1. Apache CouchDB

2. MongoDB

3. Riak

4. Redis

5. ThruDB

6. Tokyo Cabinet

7. Memcached

Apache CouchDB : Apache CouchDB is created by Damien Katz. This is a document oriented, highly distributed, schema-free database written in Erlang ideal for large concurrent applications.

The database can be queried and indexed using MapReduce style. CouchDB also offers incremental replication with bi-directional collision detection and resolution.

CouchDB provides a RESTful JSON API that can be accessed from any environment that allows HTTP requests.

This is one of the first true document oriented database is designed to scale with the web and this databases is already used by many software companies.

MongoDB : MangoDB is widely used database in this category. This is written in C++ and provides all most all the features of CouchDB. MongoDB is more matured and commercially available from a company called 10gen. The database manages collections of JSON-like document which are stored in a binary format referred to as BSON.

Riak : This one of the new entrant in this space. It combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.

Redis : This is also a new project hosted in Google code project. Redis is also key value based database system written in ANSI C and runs much faster. The implementation is very similar to Memcached. Available in most of the platforms. Provides the most of the features like other document centric databases.

ThruDB : ThruDB is also hosted in Google code project. Thrudb is a set of simple services built on top of the Apache Thrift Framework. This offers much faster and flexible easy-to-use services that can enhance or replace traditional data storage and access layers.

Others : There are many implementation of document oriented database. Most of them try to provide the key features mentioned above. This is new way of storing and retrieving data. These data base models provide an alternate and very flexible way to solve the large scale web data problems, which was traditionally a big limitation with Relational databases and other database models.

Friday, January 22, 2010

Saturday, January 9, 2010

A List of ETL tools


To the above list I would like to add the following ETL products.

Jitterbit ETL - Jitterbit
Expressor Integrator - Expressor

Friday, January 8, 2010

Snap Logic : The data flow company

Most of the commercial data integration companies today provide data integration in very traditional way. All the integration and ETL jobs run in on premise and integration is mostly focused on structured data with in corporate boundaries. Though most of them support unstructured data, the basic premise of integration never changed. As we know for past few years there are lot of web standards have been emerged and most of the software vendors embraced them. One of the important focuses was to provide a very loose coupling between different services, so that we can truly create a collage of services and able to deliver on demand agile applications. Web service and XML has taken center stage in this paradigm shift. Unfortunately this did not extend to real enterprise application integration. Always the integration was done more on using native APIs. And hence there is always risk of failure, when more no of monolithic applications communicates with each other.

On the other hand the types of the data being generated by various applications has also changed over the years. Today, the unstructured data like feeds, atoms, xml, csv files are generated more than the structured data. More over the unstructured data is not only getting generated with in corporates but also from outside like blogs, wikis. So the traditional integration is bound to fail as they were never built on these premises.

In recent years SaaS ( Software as service ) emerged as a very promising business model. This has changed the complete business dynamics. With the help of SaaS, Now vendors can offer and target any customer with in any price bracket for their applications. This completely reduces customer initial upfront investment on the software licenses. On the same time it has raised some more challenges. There are many corporate legacy applications have to talk to these externally hosted managed services . For example a company wants to integrate salesforce application with its in house legacy ERP application then we need to have to support web service compared to traditional data access layers.

So if we sum up all, in today’s data integration, we need to support various types of data from various sources with various data access apis/standards/protocols. This has to be accomplished in more loosely integrated, without any boundaries, securely, fast and agile unlike traditional data integration projects. This is what SnapLogic is trying to fill.

Snap logic is a open source data integration company. SnapLogic DataFlow is a scalable data integration platform that leverages Web technology and standards to provide organizations of all sizes with a flexible and cost-effective solution for on-demand data integration. It can connect and fetch data from various data sources like traditional data bases, SaaS apps like sales force, netsuite , social networking sites. The dataflow also provides a way to create custom components and integrate with Snaplogic.