This fall, I took a class led by students called CS196-Computer Science Honors here at the University of Illinois at Urbana Champaign. This class is unique because it teaches skills that developers use in real life, such as bash and git, as well as specialized programming languages like Rust or, in the past years, Haskell. However, the most unique component of this class is the project groups. Each group has students of different experience levels, backgrounds, and skills. Students like us lead these project groups and guide them into producing a Minimal Viable Product (MVP) and creating a pitch for this product by the end of the semester. My group and I created a Lecture Summarizing model called LCTRS (GitHub URL: https://github.com/CS196Illinois/Group6)
The MVP for this project was to create a website that summarizes lecture transcripts into valid text using Natural Language Processing (NLP). We also wanted to avoid the main issues of summarization, which include missing key points or overly verbose summaries.
For pre-training, we used a variety of datasets like gigaword, reddit_tifu, aeslc, newsroom, arixiv, and cnn-dailymail. However, most of these datasets don’t use academic language or are not conversational in tone like lecture transcripts tend to be. Also, the length of the summary compared to the length of the transcript is very small, and typically, you learn more than a couple of sentences of information from an hour of lecture. So, we created a custom dataset of lectures from all the classes we were taking. We had introductory Math, Science, English, and Social Sciences classes represented in the sample.
My partner and I used Rouge Metrics to compare the language model’s automatically generated summary to the predefined summaries from our dataset. Rouge metrics compare how many words of the user-generated summary match the computer-generated summary to quantify the model’s precision and recall. We compared the Rouge scores of DistillBERT, BERT, and T5, and we settled on T5 with its Rouge score of about 25. We chose this model since it was space-saving and took only a small amount of fine-tuning to perform well. After fine-tuning the model on our custom dataset, we improved this score to ~30.
We used Flask to render our web application. You can log in using Google OAuth and add summaries to your account for future viewing. In the dialog, you can enter your summary and the percentage you want to summarize, and the model will output the summary onto the page.
Prior to this project, I had absolutely no experience in NLP or models. However, I’m really thankful for this experience and my PM, who really helped guide our team towards creating an MVP that we can all be proud of.
Link to our final project presentation: https://drive.google.com/file/d/13qRtjcR-cRBddweiGvVDA5glAL3bco3r/view