Pipeline
- Data Acquisition
- Data Transformation
- Model Training
- Model Inference
- Model Scoring
Project folder structure
workloadmain use case scriptpythonworkload scripts for each use case for single-node systemsparkworkload scripts for each use case for multi-node systemspark3workload scripts for each use case for multi-node system
data-genconfigtpcxai-generation.xmldataset download specificationtpcxai-schema.xmldataset type schema specification
toolspythonpython setup scripts for single-node systempython.yamldefault python conda environment file for spark enginepython-ks.yamlI guess ks stands for kitchen sink with additional packagessparkspark setup scripts for multi-node system
lib- java jar libraries
driver- project config & source code (tensorflow & keras base)
Project scripts
setenv.shset necessary environment variables and scale factor
setup-python.shcreate virtual python environment for single-node system
setup-spark.shcreate virtual python environment for multi-node system
setup-spark.shcreate virtual python environment for multi-node system
TPCx-AI_Benchmarkrun.shrun benchmark
TPCx-AI_Validation.shrun validation
Full_TPCx-AI_Benchmarkrunrun validation and benchmark









PPT
www.tpc.org
https://www.tpc.org/tpcx-ai/TPCx-AI_An_Introduction_v1.3.0.pdf
Report
www.vldb.org
https://www.vldb.org/pvldb/vol16/p3649-rabl.pdf
Korean
www.tta.or.kr
https://www.tta.or.kr/tta/preportNewsNDownload.do?sfn=20230507121847947_Z2JK.pdf

Seonglae Cho