Even the most competent Hadoop users will readily admit that the platform is complex. This complexity comes in multiple forms. The classic big data challenges of volume, velocity, and variety still exist, but complexity now extends to the infrastructure, platform, and data processing layers in modern data stacks. Moreover, challenges arise not just within these four pillars, but also in the interplay between them. That’s why Acceldata built the first and only enterprise data observability solution that addresses the entire Hadoop ecosystem, including infrastructure, platform services, processing and data.
The image below provides a high-level overview of capabilities within these pillars and how they support critical Hadoop ecosystem needs.
Acceldata’s data observability platform empowers data teams with deep insights into compute, spend, data reliability, pipelines, and users. With these insights, data teams have the necessary information that allows them to optimize the performance of their data stack, especially for:
Proactively manage cost and usage to maximize business value.
Ensures that you deliver high quality data on-time, every time.
Gain real-time insights for optimal decision-making.
Acceldata customers have measured significant improvements in their data operations by leveraging the data observability platform.Common results for customers include:
Achieving these results comes not just from the breadth of data telemetry that’s monitored, but the analytics and automation appliedto the telemetry. This is what distinguishes observability from monitoring. Here are some examples:
True reliability does not come from identifying and fixing issues. Reliability comes from avoiding incidents to begin with. Acceldata takes a Predictive Maintenance approach, common to manufacturing and other industries and applies it to data. Three key capabilities are involved:
Acceldata automatically calculates a variance score and other metrics that measure trends in runtime, resource consumption, data volume and other telemetry. This allows customers to identify which areas are likely to exceed SLAs or fail in the future, even when everything shows green at present. Some customers have transitioned from weekly, reactive incident response to 12 months and counting without a single severity 1 incident.
Poor data quality is the number one obstacle for organizations to generate actionable business insights, according to 42 percent of executives surveyed by theHarvard Business Review. To improve quality coverage, Acceldata scans data and automatically generates data quality rules to quickly and easily cover the majority of potential data quality issues. Acceldata also provides an easy way to cover a wide range of data quality concerns that are lacking in most data quality solutions in the market. Schema drift, data reconciliation, data drift and anomaly detection are capabilities that customers cite as being blind spots that Acceldata addresses well.
Acceldata provides an auto-action framework based on Ansible. The solution includes over 20 runbooks out-of-the-box with the ability to develop new custom runbooks. This not only eliminates manual effort but also provides near instantaneous self-tuning and self-healing. A rich set of APIs plus alerts, notifications and triggers allow flexible orchestration between Acceldata, the Hadoop platform, ticketing systems and other external systems and processes.
A chain is only as strong as its weakest link. Similarly, a single bottleneck will weaken the performance of a query, job or other workload. Workloads can be quite complex. It’s one thing to monitor performance, it’s another to identify the root cause and how to improve performance. Acceldata provides performance analytics across three broad categories:
Acceldata automatically analyzes workloads and provides recommendations for query optimization and job configuration
Acceldata simplifies the process of right-sizing job configuration to meet a specific SLO. For example, in the chart below, an engineer can see on a curve,the runtime for a Spark job with minimal executors (A), the recommended executor count to get below a specific runtime (B), the executor count recommended for high performance (C), and where price/performance drops off with high resource consumption for small performance gains (D). This takes out the guesswork and cumbersome trial and error to right-size job configurations.
Acceldata provides a rich suite of analytic tools to identify performance bottlenecks, correlate events, and optimize jobs, queries and configuration. For example, identify which stages within a Spark job are single-threaded, which parts of a Hive query perform large scans, where overhead is high due to many sub-tasks. Event correlation can show where resources are constrained, what metrics have changed from one execution to another and how they relate to each other (e.g. data volume vs. runtime vs. memory, etc.).
Unlike the cloud, adding capacity to on-premises infrastructure takes more than the click of a button. Purchasing excess capacity insures against unexpected surges and the need to go through procurement ahead of schedule to meet increased demand. Even with excess capacity, organizations often find themselves running out.Improving resource efficiency can not only help avoid capacity issues but it can also save a lot of money and enable new use cases to be onboarded with a greater return on investment. Here are three areas where Acceldata helps improve efficiency:
Utilization analytics can help organizations optimize the scheduling of workloads to even out resource utilization over time. Chargeback reports align resource cost to business benefits to ensure the highest priority workloads are served. Identifying large, short-lived workloads can be bursted out in the cloud for the optimum hybrid cloud strategy.
Acceldata automatically profiles and flags jobs that could be run more efficiently. Data engineers can then drill down into those jobs and leverage the performance analytics tools to achieve the same or better performance with fewer resources.
HDFS analytics identifies “cold data” that is infrequently updated or accessed for potential archiving. “Small files” reports identify opportunities to improve data processing efficiency. Hot spot visualizations identify how data can be reorganized to balance load across a cluster.
Acceldata provides a suite of tools to support each of the use cases above and many others. This has enabled customers to predict and prevent incidents, scale performance by orders of magnitude, and reduce their infrastructure costs 20%-50%. Acceldata’s data observability solution is the foundation by which Acceldata can provide extended support offerings for Hadoop with the best SLAs in the industry.
The Acceldata Data Observability platform supports the following distributions:
Acceldata experts may be available to advise and assist with migrations to supported distributions and technologies.
Over time, all systems get updated and eventually replaced. Data Observability can significantly reduce the time, risk and cost associated with upgrades and migrations by automating many aspects of testing.
Acceldata’s platform makes it easy to reconcile data between two data stores. Both the structure and the data can be compared across diverse technologies. Data can be reconciled in complex scenarios where data has been integrated, aggregated or otherwise transformed.
Acceldata allows complex data pipelines to be compared and analyzed at high-level with the ability to drill down and compare the individual processing.
Acceldata allows detailed analysis of resource consumption between different executions in different environments. This assists with comparing and assessing cost estimates before and after migration.
These same capabilities serve post-migration to ensure performance, data quality and cost remain consistent with requirements and expectations. Data Observability’s value throughout the lifecycle is one of the benefits of engaging Acceldata early, even at the strategy and design phase.
The Acceldata platform is deployed alongside the Hadoop environment as a set of docker containers, connectors, repositories, and lightweight agents that work in concert to collect, store, and analyze telemetry data. The diagram below provides a high-level overview of the architecture.
Maintaining your Hadoop platform can require significant resources and can be costly. As Hadoop ages, organizations may be forced to take difficult migration paths:
Acceldata for Hadoop provides technology and support offerings that enable organizations to continue to receive the benefits of their legacy investments without having to “go it alone”. Moreover, many Acceldata customers made the switch long ago and attest that the support, service, expertise, and technology they receive from Acceldata surpass their previous experience with Cloudera and at a much lower cost. The Acceldata platform allows for options that align with almost any organization’s needs, including:
Learn more about how Acceldata supports Hadoop, and schedule an HDP/CDH assessment of your data environment.