项目介绍
一个开源存储框架,支持构建包含 Spark、PrestoDB、Flink、Trino 和 Hive 等计算引擎及 API 的 Lakehouse 架构
仓库概览、指标与主题
一个开源存储框架,支持构建包含 Spark、PrestoDB、Flink、Trino 和 Hive 等计算引擎及 API 的 Lakehouse 架构
本页面由技术编辑团队精选收录,内容均为原创整理。
社区关注度与协作度较高,适合实践与生产使用。
Apache Spark - 一个用于大规模数据处理的统一分析引擎
Scala 2编译器和标准库。Scala 2 https://github.com/scala/bug bugs;斯卡拉3 at https://github.com/scala/scala3
它像 git-filter-branch 一样移除大块或麻烦的块状物,但更快。而且是在斯卡拉写的
CMAK 是一个用于管理 Apache Kafka 集群的工具
一个容错、协议无关的RPC系统
开源高性能RISC-V处理器
Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
The following are some of the more popular Delta Lake integrations, refer to delta.io/integrations for the complete list:
See the online documentation for the latest release.
Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables. Specifically, this library provides APIs to interact with a table’s metadata in the transaction log, implementing the Delta Transaction Log Protocol to achieve the transactional guarantees of the Delta Lake format.
There are two types of APIs provided by the Delta Lake project.
DataFrameReader/Writer (i.e. spark.read, df.write, spark.readStream and df.writeStream). Options to these APIs will remain stable within a major release of Delta Lake (e.g., 1.x.x).Delta Lake guarantees backward compatibility for all Delta Lake tables (i.e., newer versions of Delta Lake will always be able to read tables written by older versions of Delta Lake). However, we reserve the right to break forward compatibility as new features are introduced to the transaction protocol (i.e., an older version of Delta Lake may not be able to read a table produced by a newer version).
Breaking changes in the protocol are indicated by incrementing the minimum reader/writer version in the Protocol action.
Delta Transaction Log Protocol document provides a specification of the transaction protocol.
Delta Lake ACID guarantees are predicated on the atomicity and durability guarantees of the storage system. Specifically, we require the storage system to provide the following.
See the online documentation on Storage Configuration for details.
Delta Lake ensures serializability for concurrent reads and writes. Please see Delta Lake Concurrency Control for more details.
We use GitHub Issues to track community reported issues. You can also contact the community for getting answers.
We welcome contributions to Delta Lake. See our CONTRIBUTING.md for more details.
We also adhere to the Delta Lake Code of Conduct.
Delta Lake is compiled using SBT.
To compile, run
build/sbt compile
To generate artifacts, run
build/sbt package
To execute tests, run
build/sbt test
To execute a single test suite, run
build/sbt spark/'testOnly org.apache.spark.sql.delta.optimize.OptimizeCompactionSQLSuite'
To execute a single test within and a single test suite, run
build/sbt spark/'testOnly *.OptimizeCompactionSQLSuite -- -z "optimize command: on partitioned table - all partitions"'
Refer to SBT docs for more commands.
Follow Conda Download to install Anaconda.
Follow Create Environment From Environment file to create a Conda environment from <repo-root>/python/environment.yml and activate the newly created delta_python_tests environment.
# Note the `--file` argument should be a fully qualified path. Using `~` in file
# path doesn't work. Example valid path: `/Users/macuser/delta/python/environment.yml`
conda env create --name delta_python_tests --file=<absolute_path_to_delta_repo>/python/environment.yml`
Build needs JDK 11. Make sure to setup JAVA_HOME that points to JDK 11.
conda activate delta_python_tests
python3 <delta-root>/python/run-tests.py
IntelliJ is the recommended IDE to use when developing Delta Lake. To import Delta Lake as a new project:
~/delta.File > New Project > Project from Existing Sources... and select ~/delta.Import project from external model select sbt. Click Next.Project JDK specify a valid Java 11 JDK and opt to use SBT shell for project reload and builds.Finish.build/sbt clean package. Make sure you use Java 11. The build will generate filesAfter waiting for IntelliJ to index, verify your setup by running a test suite in IntelliJ.
DeltaLogSuiteRun 'DeltaLogSuite'If you see errors of the form
Error:(46, 28) object DeltaSqlBaseParser is not a member of package io.delta.sql.parser
import io.delta.sql.parser.DeltaSqlBaseParser._
...
Error:(91, 22) not found: type DeltaSqlBaseParser
val parser = new DeltaSqlBaseParser(tokenStream)
then follow these steps:
11. You can set this usingexport JAVA_HOME=`/usr/libexec/java_home -v 11`
build/sbt clean compile.File > Project Structure... > Modules > delta-spark.Source Folders remove any target folders, e.g. target/scala-2.12/src_managed/main [generated]Apply and then re-run your test.Apache License 2.0, see LICENSE.
There are two mediums of communication within the Delta Lake community.
评论