Group 9 Expo Webpage
Identifying Authorship of Source Code with Neural Networks
Designed by Group 9:
Rui Li, Jidong Xu, Hao Tang
Team Members and Advisor
Advisor:
Boyang Wang - Asst Professor, CEAS - Elec Eng & Computer Science
Team Members:
Rui Li (EE) – C++, Python, RF Learning, Microsoft, Group Communication
Jidong Xu (EE) – C++, Matlab, RF Learning, Microsoft, Group Communication
Hao Tang (EE) – C++, C, Matlab, Microsoft, Group Communication
Abstract
Identifying authorship of source code can be used in many applications, such as identifying the authors of malware, plagiarism detection, tracing vulnerabilities of software, etc.
The problem can be formulated as a classification problem in machine learning. Therefore, we choose leverage word embedding (e.g. TF-IDF) and Long Short-Term Memory (LSTM) networks to encode source code and perform classification.
Presentation Video
Github code link of our project: