PinnedJean YvesinTowards Data ScienceDelight: The New & Improved Spark UI & Spark History Server is now Generally AvailableDelight is a free, hosted, cross-platform monitoring dashboard for Apache Spark with memory and CPU metrics that will hopefully delight…May 5, 2021May 5, 2021
PinnedJean YvesinData MechanicsSpark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source VersionIf you’re looking for a high-level introduction about Spark on Kubernetes, check out The Pros And Cons of Running Spark on Kubernetes, and…Apr 28, 2021Apr 28, 2021
Jean YvesinTowards Data ScienceRun your R (SparklyR) workloads at scale with Spark-on-KubernetesTutorial: How to build the right Docker image, start your Spark session, and run at scale!Jan 25, 2022Jan 25, 2022
Jean YvesinTowards Data ScienceImprove Apache Spark performance with the S3 magic committerAchieve up to 65% performance gain using the latest S3 magic committer from Spark 3.2 and Hadoop 3.3!Jan 20, 20221Jan 20, 20221
Jean YvesinTowards Data ScienceApache Spark 3.2 Release — What’s New For Spark-on-KubernetesApache Spark 3.2 was released in October 2021(see release notes) and it is now available for Data Mechanics customers, and for anyone who…Nov 4, 2021Nov 4, 2021
Jean YvesinTowards Data ScienceTutorial: Running PySpark inside Docker containersIn this article we’re going to show you how to start running PySpark applications inside of Docker containers, by going through a…Oct 28, 20211Oct 28, 20211
Jean YvesinTowards Data ScienceOptimized Docker Images for Apache Spark — Now Public on DockerHubThey include Spark, Python, Scala, Java, Hadoop, and fast connectors to S3, GCS, Azure Data Lake, Delta Lake, Snowflake, and other sources…May 12, 20214May 12, 20214
Jean YvesinTowards Data ScienceThe Story of a Migration from EMR to Spark on KubernetesThe goals of our migration, the architecture we targeted, the technical challenges we encountered, and the results we achieved.Apr 27, 20211Apr 27, 20211
Jean YvesinTowards Data ScienceApache Spark 3.1 Release: Spark on Kubernetes is now Generally AvailableWith the Apache Spark 3.1 release in March 2021, the Spark on Kubernetes project is now officially declared as production-ready and…Mar 30, 20211Mar 30, 20211
Jean YvesinTowards Data ScienceHighlights of Data + AI Summit 2020 (formerly Spark Summit)Recent developments with Spark 3.0, Spark-on-Kubernetes going GA, PySpark usability improvements, and more.Dec 15, 2020Dec 15, 2020