Pipeline
- Data Acquisition
- Data Transformation
- Model Training
- Model Inference
- Model Scoring
Project folder structure
workload
main use case scriptpython
workload scripts for each use case for single-node systemspark
workload scripts for each use case for multi-node systemspark3
workload scripts for each use case for multi-node system
data-gen
config
tpcxai-generation.xml
dataset download specificationtpcxai-schema.xml
dataset type schema specification
tools
python
python setup scripts for single-node systempython.yaml
default python conda environment file for spark enginepython-ks.yaml
I guess ks stands for kitchen sink with additional packagesspark
spark setup scripts for multi-node system
lib
- java jar libraries
driver
- project config & source code (tensorflow & keras base)
Project scripts
setenv.sh
set necessary environment variables and scale factor
setup-python.sh
create virtual python environment for single-node system
setup-spark.sh
create virtual python environment for multi-node system
setup-spark.sh
create virtual python environment for multi-node system
TPCx-AI_Benchmarkrun.sh
run benchmark
TPCx-AI_Validation.sh
run validation
Full_TPCx-AI_Benchmarkrun
run validation and benchmark