Introduction
Data stored in SaaS applications is often inaccessible to BI tools. This is a major headache to early adopters of SaaS applications. With on-premise applications, IT departments can bypass the application and access data directly from the underlying database. With multi-tenant SaaS applications, such direct database access is not available because the database is shared with other customers.
Understanding the Problem
Ideally, all data access should go through the application. There are some very compelling reasons to go through the application:
- The application manages data-level access rights. For example, allowing a user to only see data for their region.
- The application manages data at a business-object level. Such data objects are often assembled via object-relational mapping of application objects to relational database tables.
- Multitenant SaaS applications restrict users from seeing data that belongs to other tenants.
For these reasons, bypassing the application to access data directly from the underlying database is not a good idea in general, and is not possible with SaaS applications.
Current Strategies
Let’s review the strategies that applications currently provide for data access.
Data Export
Most if not all applications allow users to export data into a file, typically Excel or CSV, that can be loaded into a spreadsheet or imported into a BI tool. This approach is easy to use and works with most tools, however it suffers from several serious drawbacks:
- Data is outdated as soon as it is exported
- Works well for small data sets, but takes too long to move large amounts of data
- Works well for single tables, but not so well when the analysis requires data from multiple related tables
Web Services
SaaS applications typically provide a Web Service API for data access. Access is direct and is managed by the application. In principle, this is the desired solution. However, due lack of standards, most SaaS applications provide limited APIs that are useful for obtaining specific records or for exporting data, but are not suited for query and reporting because they lack an expressive query language such as SQL.
Specifically, the missing pieces are:
- Lack of support for aggregate queries. For example, requesting sales totals grouped by product and region. Without such API, BI tools have to request potentially very large data sets to be aggregated. This very quickly becomes prohibitive for real-time data reporting.
- Lack of support for table joins and data filtering (other than the most basic). For example, requesting all the orders for customers of a given sales person within a certain range of order size.
- Lack of a standard API similar to SQL and ODBC/JDBC. This lack of standard means that BI vendors need to develop a connector for every application that they support and every application vendor has to implement their own API.
Data Warehousing
Given that SaaS applications do not provide an API for real-time data access, the typical, yet rather expensive, solution is to export data from the application into a relational database and then run reports again this database.
In addition to being expensive to setup and maintain, this solution also suffers from the fact that the data is accurate only as of the last time it was exported. Frequent data synchronization makes the solution even more expensive, and yet it is never real-time. Users today expect to see up-to-the-minute data, not yesterday’s data.
Standard Data Access API for SaaS Applications
The BI and SaaS vendor communities need to collaborate on defining an API for real-time data access. Technologically, this is not very hard and it’s been done for relational database back in the early nineties. I believe that the leadership must come from the SaaS vendor community because this is the community that stands to gain the most by solving this problem. If you belong to that community, then consider this a call to action. Please contact me if you’d like to develop this idea further.