LakeSail is a computation framework designed to unify batch processing, stream processing, and compute-intensive (AI) workloads. It provides a drop-in replacement for Spark SQL and the Spark DataFrame API, enabling seamless integration in both single-host and distributed environments.
Sail is available as a Python package on PyPI. Install it using pip
:
pip install "pysail[spark]"
For better performance, you can also build Sail from source. Refer to the Installation Guide for detailed instructions.
You can start the Sail server using one of the following methods:
sail spark server --port 50051
from pysail.spark import SparkConnectServer
server = SparkConnectServer(port=50051)
server.start(background=False)
Deploy Sail on Kubernetes for distributed processing. Follow the Kubernetes Deployment Guide for setup instructions.
kubectl apply -f sail.yaml
kubectl -n sail port-forward service/sail-spark-server 50051:50051
Once the server is running, connect to it using PySpark:
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
spark.sql("SELECT 1 + 1").show()
For more details, refer to the Getting Started Guide.
The latest documentation for Sail can be found here.
Contributions are welcome! Submit GitHub issues for bug reports and feature requests. Join GitHub discussions to ask questions or share ideas. For code changes, refer to the Development Guide.
LakeSail offers enterprise support for Sail. Contact us for more information.
LakeSail's mission is to unify batch processing, stream processing, and compute-intensive (AI) workloads. Learn more at lakesail.com.
Sail is licensed under the Apache-2.0 license.