Very first data cloud — Evolution of data lake to hybrid cloud
Today, in this digitized world, data is being generated at a very high rate and from many different sources, and it is growing exponentially in many use cases. As a result, several peta bytes of data is just becoming norm. Further, it is also generated in various form whether structured, semi-structured or unstructured… and variety of data is also expanding beyond just simple text such as images, videos, voice, and so on to extract intelligence.
This massiveamount of data is being ingested in on-premise or public cloud data lake. It is quite evident that IT leaders are challenged in finding ways, how to maximize the ROI of their data lake, how to unlock the power of their data to its full potential, and which should also be fully aligned with the business outcomes. Further, how to modernize apps, adopt cloud-native architecture, create micro-services with Kubernetes, and utilize advanced analytics using AI/ML frameworks. Amid those challenges, siloed monolithic apps and data are further slowing down the pace of innovation and limiting their transformation journey towards modern digitization.
Given all of that, industry is looking for hybrid solutions for their data lakes that provides unified user experiences with common identity, single API framework that stretches from private cloud to public cloud, auto-scales when app demand grows, implement tighter control over sensitive data with data governance and compliance all across the board, and implement common data serving layer for data analytics, business intelligence, AI inferencing and so on.
Traditionally, hybrid cloud was viewed as connecting two data centers (public and private). However, modern hybrid cloud is all about digital transformation of apps and data. Apps have transformed into whole new thinking of IT. Previously apps were supporting business functions, now “Apps are the business!”. It’s about gaining agility, flexibility, and reliability with Hybrid Cloud by utilizing advanced analytics, AI, Kubernetes, and micro-services…. It is about massive scale, flexible consumption, optimized security, performance, availability, and granular control.
Cisco UCS X-series is designed with hybrid datacenter and cloud in mind. The primary objective of UCS X-Series is to help customers address the needs of modern apps and hybrid cloud operational models…. while reducing the complexity. X series modular design provides flexibility to address rapidly changing requirements of modern apps and their data.
Cisco UCS X-series is fully managed by Cisco Intersight. With Intersight, you get all of the benefits of SaaS delivery and full lifecycle management of distributed infrastructure and workloads across data centers, remote sites, branch offices, and edge environments. This empowers you to analyze, update, fix, and automate your environment in ways that were not previously possible. As a result, your organization can achieve significant TCO savings and deliver applications faster in support of new business initiatives.
Hadoop ecosystem has evolved over the years from batch processing (Hadoop 1.0) to streaming and near real-time analytics (Hadoop 2.0) and to Big data meets AI (Hadoop 3.0). Currently the capabilities of the technologies are evolved to enable data lake as a private cloud with separation of storage and compute and going forward to support hybrid cloud (and multi-cloud).
Cloudera released two software in the second half of 2020, both of which together enables the data lake as a private cloud.
1- Cloudera Data Platform Private Cloud Base — which provides storage and supports the traditional Data lake environments and also introduced Apache Ozone, the next generation filesystem for Data Lake
2- Cloudera Data Platform Private Cloud Experiences which provide different experiences or personas (data analyst, data scientist, data engineer) based processing of workloads for data stored in CDP Private Cloud Base
Cisco Data Intelligence Platform (CDIP) is powered by Cloudera Data Platform (CDP) and all the innovations is very well integrated and validated within CDIP.
Similar experience is also available in public cloud from Cloudera under marketplace offerings. This opens up the unified experience with hybrid apps and data… With hybrid Hadoop eco-system, we can stretch the boundaries of data lake with Cloudera SDX layer that offer tighter data security, governance and compliance from private cloud to public cloud.
Cisco Data Intelligence Platform is a cloud scale architecture which brings together data lake and AI/compute farm tiers to work together as a single entity while also being able to scale independently to address the IT issues in the modern datacenter.
CDIP is designed with Cloud Advantage in mind which Decouples compute and storage. By separating compute and storage, we can scale up or out each as needed. With Hybrid cloud, this separation increases scale boundaries. The benefits it brings, now driving the overall data landscape movement’s momentum. However, this separation brings some challenges, high speed fabric plays a vital role in this case, and further to improve performance, local transient data utilize local disk cache which can be rebuild and repurposed based on apps. That’s how the solutions are architected in CDIP with CDP. This separation also allows to direct the investment where it needs the most (either storage or compute) and cost benefits are clear. There are many instances, when compute need is purely elastic, while storage remains constant and or vice-versa.
Private cloud creates virtual data crunching computes in the form of containers with resources such as CPU/Memory/GPU whether Spark cluster, AI running in standalone utilizing Tensorflow or PyTorch, Notebooks such as (Jupyter or Cloudera work bench) for data scientists, or virtual warehouses. All of it gets provisioned or de-provisioned on-need basis in Kubernetes backed container cloud.
CDIP offers pre-validated designs both for data lake and private cloud. In these reference designs, Cisco achieved architectural innovation with partners. In addition to that, Cisco published various world record performance benchmarks with TPC and proved linear scaling. Cisco published top performance numbers both for traditional map reduce and for Spark which is next generation of compute for crunching big data. And furthermore, CDIP offers centralized management with Cisco Intersight. Cisco Intersight innovation and addition of new features and capabilities is on the highest-gear which will bring lot of exciting innovation with the context of hybrid cloud; and all of it, is fully aligned with UCS X-series and CDIP, such as solution automation with orchestrator, observability, monitoring, and so on.
In CDIP, UCS X-series offers excellent platform for container cloud as compute engine for modern apps in the hybrid world. In the coming years, velocity of apps modernization will be tremendous, UCS X-series is fully aligned with and there will be wave of new technologies coming over such as new compute modules, networking fabric, PCIe fabric, pooled NVMe drives, persistent memory, GPU accelerators, custom ASICs, and so on.