
Data warehousing, as the name suggests, collects data from many sources and can be queried, analyzed, and mined for intelligence. It can be difficult to decide whether to keep it on-premises or on a public cloud. Rohit Amarnath (CTO at Vertica) explains the options and tradeoffs offered by the DWaaS model.
Data is being generated at an unprecedented rate and scale thanks to mobile apps, the Internet of Things, and digital consumer interactions. Businesses want to collect and consume this data as they can identify consumption patterns, provide greater personalization, and engagement and offer tailored products and services.
However, it is not easy to store, maintain, and analyze such large volumes of data. Over the past few years, The general trend has been to store this data in the Cloud because of the convenience and cost savings associated with cloud-elastic commercial models. The data is stored in cloud object stores (also known as data lakes) and then analyzed with cloud data warehouses.
These cloud data warehouses, and “data lakehouses“, have evolved in sophistication with innovations that leverage cloud strengths like the separation of computing and storage. Most cloud providers and data warehouse vendors now offer public cloud-based data storage services (a.k.a. data warehouse-as-a-service or DWaaS) with consumption-based pricing. The cloud vendor handles the heavy lifting, such as setting up, maintaining, and upgrading data warehouses and all associated software.
There are many options available for DWaaS. The business use case will determine which one is best. As part of their overall decision-making process, organizations should carefully consider DWaaS.
Also read: Cloud Data Warehouse: What it is, Why it Matters, and Best Practices
Software limitations
Software that is made available as-a-service includes self-service and ease of use, but it must be balanced with the stability and security of the platform. Other factors, such as which capabilities are offered by the cloud, can also impact the functionality and capability of the service.
Take, for example, Cloud object stores that may have different performances and features. This can lead to DWaaS performing differently in each cloud or only supporting one type of cloud. Some cloud-only providers may restrict the ability of an individual to manage hybrid or on-premises workloads if they are required. This is for security or compliance reasons.
Concurrent data systems are hard
DwaaS is a multi-node cluster-based platform. Concurrent systems are intrinsically sophisticated and have specific rules about how cloud infrastructure can be set up. This includes the number of nodes and how they can be scaled up or down or how inter-node communication works.
Although this can be restrictive, the rules are often there for valid reasons. They provide stability for complex concurrent systems communication, data transfer, and data transfer while also handling any failure that may occur. It is true that everything can fail in the cloud. How the system handles failures will determine how well it has been designed for the cloud. Vendors have to ensure that flexibility at the cloud infrastructure level is maximized to make data storage in the cloud more efficient.
Hybrid Limitations
Research shows that hybrid cloud adoption is becoming increasingly popular among businesses. While security, privacy, and regulations are clearly important drivers, other factors such as cost, flexibility, and maintenance, a cloud-only model can still be more expensive than running on-premises.
Some DWaaS services do not offer the same capabilities as on-premises. This can lead to businesses needing multiple data warehouses. These complexities can significantly increase the cost of orchestration and operation for major analytics programs.
Security Considerations
The DWaaS model can be relied upon as a convenient option. It’s easy to let someone else handle everything, while you focus on reporting, analytics, dashboards, and reports. Choosing DWaaS can mean that some providers have more access to your data. Tier-one DWaaS vendors use encryption and cryptographic controls that separate warehouse operations from data access. They also have security controls that allow them to be certified for security standards such as ISO27001 or SOC II. This allows you to trust their security processes and policies in protecting your data.
Hidden Charges
Cloud-related budget overruns can be a problem. Cloud fees can spiral quickly out of control due to the inability to predict usage, growing complexity, and the inherent elasticity of the cloud. Be aware of hidden fees that may be tied to performance and consumption.
DwaaS business models are affordable to start, but you should be prepared to monitor the costs for unexpected growth in data and extra computing power. Higher tiers may have security features, but they will likely come at a higher price.
Vendor Lock-ins
Many people believe that it is easy to switch between DWaaS. Each platform is different and requires significant migration efforts, especially if customization is involved. To lock customers into complex deployments and make it difficult for them to reproduce on other platforms, DWaaS vendors offer a variety of add-on services. Egress fees are often charged by cloud providers if businesses wish to move to another cloud or relocate their workloads to premises. It can be easier to move workloads around by sticking with standard SQL-based relational database warehouses.
DWaaS is likely to make sense for many companies. They offer many benefits over traditional on-premises data centers, including lower staffing requirements, ease of scaling, and lower IT expenses. Consider your data workloads, and consider data warehouse providers that can leverage cloud strengths such as separation of compute from storage, consumption-based pricing and security certifications and controls, transparency and flexible deployment models, and transparency in pricing.