PinnedJean YvesinTowards Data ScienceDelight: The New & Improved Spark UI & Spark History Server is now Generally AvailableDelight is a free, hosted, cross-platform monitoring dashboard for Apache Spark with memory and CPU metrics that will hopefully delight…9 min read·May 5, 2021----
PinnedJean YvesinData MechanicsSpark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source VersionIf you’re looking for a high-level introduction about Spark on Kubernetes, check out The Pros And Cons of Running Spark on Kubernetes, and…5 min read·Apr 28, 2021----
Jean YvesinTowards Data ScienceRun your R (SparklyR) workloads at scale with Spark-on-KubernetesTutorial: How to build the right Docker image, start your Spark session, and run at scale!4 min read·Jan 25, 2022----
Jean YvesinTowards Data ScienceImprove Apache Spark performance with the S3 magic committerAchieve up to 65% performance gain using the latest S3 magic committer from Spark 3.2 and Hadoop 3.3!7 min read·Jan 20, 2022--1--1
Jean YvesinTowards Data ScienceApache Spark 3.2 Release — What’s New For Spark-on-KubernetesApache Spark 3.2 was released in October 2021(see release notes) and it is now available for Data Mechanics customers, and for anyone who…8 min read·Nov 4, 2021----
Jean YvesinTowards Data ScienceTutorial: Running PySpark inside Docker containersIn this article we’re going to show you how to start running PySpark applications inside of Docker containers, by going through a…4 min read·Oct 28, 2021--1--1
Jean YvesinTowards Data ScienceOptimized Docker Images for Apache Spark — Now Public on DockerHubThey include Spark, Python, Scala, Java, Hadoop, and fast connectors to S3, GCS, Azure Data Lake, Delta Lake, Snowflake, and other sources…4 min read·May 12, 2021--4--4
Jean YvesinTowards Data ScienceThe Story of a Migration from EMR to Spark on KubernetesThe goals of our migration, the architecture we targeted, the technical challenges we encountered, and the results we achieved.5 min read·Apr 27, 2021--1--1
Jean YvesinTowards Data ScienceApache Spark 3.1 Release: Spark on Kubernetes is now Generally AvailableWith the Apache Spark 3.1 release in March 2021, the Spark on Kubernetes project is now officially declared as production-ready and…8 min read·Mar 30, 2021--1--1
Jean YvesinTowards Data ScienceHighlights of Data + AI Summit 2020 (formerly Spark Summit)Recent developments with Spark 3.0, Spark-on-Kubernetes going GA, PySpark usability improvements, and more.7 min read·Dec 15, 2020----