Skip to main content

Data Engineering

Backlog

Shared Spark Cluster

Vote (46) Share
Ryoma Nagata's profile image

Ryoma Nagata on 09 Dec 2023 01:59:51

Users within the same workspace can utilize the same launched Spark Cluster. This eliminates the need for users to wait for the startup time of Spark, and also prevents excessive CU consumption caused by launching multiple clusters. This is different from the Hi-concurrency pool, which aims to reuse clusters within the same user.

Administrator on 30 Jul 2024 17:27:15

Thank you for sharing this idea. We do have this in our roadmap of extending the sharing scope across multiple users as part of High Concurrency mode. 

Comments (1)
Ryoma Nagata's profile image Profile Picture

Vasu Nallasamy on 13 Mar 2024 08:44:26

RE: Shared Spark Cluster

This has been the problem in Synapse and Fabric has the same problem. We are running quite a lot of PySpark ETL notebooks but most of them are pretty much lightweight python scripts.Each one of them spawns own Spark Session in Pipeline.It would be great that Pipeline Notebooks can share Spark Sessions like the Hi-Concurrency ones for the interactive notebooks.