- encode/pre-process the features
- deal with potential missing data
- identify potential models to solve the machine learning task
- apply them appropriately and interpret the results
Tasks
- Task 1: Dataset description (15%)
- Task 2: Data assembling and initial pre-processing (10%)
- Task 3: Design and build a machine learning pipeline (40%)
- Task 4: Model Interpretation (10%)
- Task 5: Alternative machine learning pipeline (25%)
Submission
- A short-written report (max 3 pages) The report must be in 11-point Arial font and portrait format.
- A second pdf document containing figures presenting your results. They must all be numbered and referred to from the report. Each figure must have a short caption explaining what it is
Todo
pre-processing steps need to be embedded in the cross-validation framework to avoid data leaking.
Discuss which strategy could be used to better encode the diagnoses features
normalize input
백의 자리랑 십의 일의 자리 나누기
race age 추가 → empirically bad
one hot vector
Seonglae Cho