Cloud based Information Management

 Casper

By Casper Thomsen, 
Partner Novataris
15305 Dallas Pkwy, Suite 300, Dallas, TX 75001
Mobile+1 469 667 0926| Office +1 972 715 2000

A large upfront investment to get started with any information management solution is the needed software licenses. You start looking at massive amounts of distributed data, parallel processing and infrastructure costs rapidly grow as well and the amount of issues, pitfalls and complexity grow exponentially.
A more and more popular choice is to virtualize servers in the cloud, extending capacity dynamically with low risk. This certainly helps, but still leaves licensing costs as well as platform maintenance. There is a better way.
 
A traditional BI solution consists of:

  • A set of databases (Ie. Stage, Archive, DW, ODS, etc)
  • Data integration tasks (ETL)
  • One or more cubes and datamarts
  • Data visualization tools

These can now all be accomplished using Platform as a Service (PaaS) components thereby avoiding the expensive SQL Enterprise, OBIEE or similar licensing.

Databases

Microsoft Azure offers Azure SQL DB and Azure SQL DW. SQL DB are your traditional databases but hosted in the cloud without the need of virtual servers or licenses. Pay for the data and the service on a usage basis. Connect to them using Management Studio, Visual Studio or any other tool as normal. Leave the issues of service packs, patches, security, scaling, storage and optimization behind. Azure SQL DB is incredible cheap to spin up, you get far for just $10 per month.
 
Regardless of whether you are considering a cloud IM investment or not, Azure SQL DB should seriously be considered to replace your current database environments. Connect your site with Azure using a site-to-site VPN, migrate your databases to Azure SQL DB and you can eliminate a large portion of your SQL licenses, all of your server costs, significant maintenance and start paying for the usage rather than capacity. This can save you from 30-70% on your SQL TCO. If you are worried about security, that's a whole other topic, send me a message and I will be happy to discuss this with you. 
 
Azure SQL DW correspond roughly to the PDW (Parallel Data Warehousing) solutions, It is a lot more complex and advanced when it comes to storing and reading data, but still lacks quite a few regular T-SQL features. Its exceptionally good for handling massive amounts of data in a distributed fashion. It should be considered only if you have terabytes or petabytes of data or a large amount of unstructured data to pass through quickly. It can be well worth it in many situations, but I see many scenario's where Azure SQL DW becomes much more of a burden than value-add to an organization. Pick carefully.

Data integration (ETL)

This has long been the biggest issue. Without this component, the full BI license is still needed which significantly reduces the ROI of moving to a PaaS based IM solution. This is where Azure Data Factory comes in to play.

Cloud

Azure Data Factory is an online service in Azure provided to orchestrate data extraction, transformation and loading. This is not SSIS in the cloud and will require more coding. There is a range of data source connectors available such as Salesforce, SAP, Navision, CRM and more that let you easily connect to a range of sources and destinations. Developers then build the transformations, publishes the pipelines and activities to the data factory host and its ready to run. For Azure Data Factory you pay per pipeline deployed and for activities running (simplified version).

The usage based model does mean that we again need to start thinking about how frequent data refresh are necessary. On traditional deployments I will often advocate daily and sometimes hourly updates because it simplifies the infrastructure and because ... well, why not? It's just sitting there doing nothing anyway and you still pay for it. In a cloud solution it's the other way around. A low frequency activity is 60 cent per month to run, so it's not going to bleed you dry the first month, but it adds up eventually and its good practice to think about planning early on.

Cubes

We still don't have Analysis Services (SSAS) as a PaaS component but there is another good option. Power BI, generally viewed as a data visualization tool also holds a powerful in-memory tabular "cube" engine. It is perfectly possible to build a tabular model in Power BI and deploy it to your workspace, then connect to it using Excel pivot tables just as you would with a "regular" multidimensional cube. This is a pretty neat feature that Power BI offers. You don't have to use their visualization component, take advantage of the tabular model behind it and make your own analysis in Excel.

The Power BI service would be able to handle scheduled data refresh.

Cloud image

Data visualization

There are plenty of options that could work to visualize data. An obvious choice here would be to use Power BI as we already used this for the cube. It's a great visualization tool that is rapidly becoming more advanced and able to work in an enterprise environment. Excel is also a great tool, but should be used to compliment a reporting platform with deeper analytics, not replace it.

There are other good libraries to use for developers to build something custom, such as D3, ChartIO, etc.

Reporting Services would in this case be excluded since there is no PaaS version and therefore would require a license. Interestingly SSRS was offered as a service much earlier, but it was never a success and was taken down again not to return as of yet.

Summary

In summary, going strictly cloud based for all services require a new set of skills to adopt in the organization. The technologies like Azure Data Factory are very new and there is a good possibility that issues will arise along the way. Others however, like Azure SQL DB are starting to look more seasoned and the price point should make it impossible for any IT organization to look the other way.

Its appealing to get rid of the old servers and not have to invest in new infrastructure, licenses or listen to the constant requests for more data, more memory or bigger CPUs. It's not perfect yet, but it's rapidly moving this way now. There is a lot of money to be saved in consideration of TCO, not to mention that closing down a project that failed will cost only hours of compute time rather than millions on servers and licenses. 

It is now finally feasible with a maturing platform support that does not require building it all from scratch. It's an exciting  new prospect to consider when looking at renewing or starting construction of an IM solution as well as when looking at the ongoing BI strategy and investments in that area.

Update: As I am writing this, availability of Analysis Services as a cloud services has just been announced (see this link for further information). This is exciting news that bring even more opportunities to build out a fully cloud service based BI architecture.