Create dataset split for IU dataset. Preprocess report for normalizion, such as lower-case, replacing nonsense tokens(e.g., 'xxxx-a-xxxx'). Build vocabulary file. Xian Wu, Shuxin Yang, Zhaopeng Qiu, ...