![]() ![]() Run Spark jobs on GKE clusters with Dataproc Jobs API ○ $0.10 per cluster/hour + infrastructure cost ○ Scales to an industry-leading 15k worker nodes Secured and fully managed Kubernetes service ![]() ○ Finding the right host to fit your workload ○ Inspired and informed by Google’s experiences Unique benefits orchestrating Spark Jobs on Kubernetes compared to other cluster managers -Īpache Hadoop YARN vs Kubernetes for Apache SparkĬombining the best of open source and cloud and simplifying Hadoop & Spark workloads Scaling your Data Pipelines with Apache Spark on Kubernetes Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster. In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal. There is growing interest in running Apache Spark natively on Kubernetes. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |