Ge Gao

Software Engineer
(he/him/his)
Boston, MA

ABOUT ME

Hello! This is Ge and I am a first-year PhD student in computer science at Boston University, advised by Dr. Margrit Betke and Dr. Derry Wijaya. My research is in the intersection of emerging media and artificial intelligence, with a goal to use machine learning tools to enhance the ways we access and process information in the newsworld. Alongside my research, I am a self-motivated software engineer with substantial industry experience in full-stack and mobile development, proficient in developing end-to-end solutions with Java, C++, JavaScript, and Python.

Outside of work, I enjoy skiing and cooking!

Recent:
  • Our paper "Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation" has been accepted at the LREC-COLING 2024 conference.

PUBLICATIONS

WORK EXPERIENCES

Software Engineer
Homesite Insurance | 2020 - present

As a backend software engineer, I maintained and improved the backend sales platform with Spring Boot framework.

  • Designed and implemented an internal tracking ID system that has gone alive with AWS lambda functions
  • Optimized architure of microservice interactions and speed up the insurance quoting flow by 70-75%.
Software Engineer
TripAdvisor | 2019 - 2020

As a full stack software engineer, I built and optimized the TripAdvisor flights, cruises, and cars pages with Java microservices and React.

  • Optimized page performance of TripAdvisor cars pages with up to 250% increase in lighthouse score.
  • Designed and implemented a python tool that verifies user interactions and tracking events on the page.
  • Implemented a geo image next to the search form for every flight search with graphql and React.
Software Engineering Intern
Shell Techworks | 2018
  • Designed and implemented a caching mechanism to parse and store interaction rules used for a system of deepwater wells into compiled code blocks, significantly speeding up the loading process of the rules.
  • Compressed the output file size by 75% by redesigning an auditing system in the code base.
  • Built RESTful API endpoints for processing wells data with Loopback and MySQL and implemented the unit tests for each endpoint with SuperTest.
Mobile Software Engineering Intern
Education First | 2017
  • Developed and maintained a mobile platform for a traveling app, released on both Google Play and App Store with over 1000 downloads.
  • Implemented a smart tour checklist, summarizing the required items based on the data of the visiting countries requested from MongoDB; the feature has gone alive on Google Play.
  • Implemented a syncing mechanism for local Android database and MongoDB, saving offline user activities that can be synchronized to MongoDB when connected to the Internet.

Skills

Programming Languages

  • C/C++
  • Python (Tensorflow, Pytorch, Keras, Numpy)
  • Java (Spring)
  • JavaScript (NodeJS, React, Bootstrap)
  • C#
  • Scala
  • Erlang

Databases

  • MySQL
  • PostgreSQL
  • MongoDB

Projects

I wholeheartedly take delight in building softwares and I am excited about the sheer joy of creating something fun!




Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation

  • Predicting emotions elicited by news headlines can be challenging as the task is largely influenced by the varying nature of people’s interpretations and backgrounds. Previous works have explored classifying discrete emotions directly from news headlines. We provide a different approach to tackling this problem by utilizing people’s explanations of their emotion, written in free-text, on how they feel after reading a news headline. Using the dataset BU-NEmo+, we found that for emotion classification, the free-text explanations have a strong correlation with the dominant emotion elicited by the headlines. The free-text explanations also contain more sentimental context than the news headlines alone and can serve as a better input to emotion classification models. Therefore, in this work we explored generating emotion explanations from headlines by training a sequence-to-sequence transformer model and by using pretrained large language model, ChatGPT (GPT-4). We then used the generated emotion explanations for emotion classification. In addition, we also experimented with training the pretrained T5 model for the intermediate task of explanation generation before fine-tuning it for emotion classification. Using McNemar’s significance test, methods that incorporate GPT-generated free-text emotion explanations demonstrated significant improvement (P-value < 0.05) in emotion classification from headlines, compared to methods that only use headlines. This underscores the value of using intermediate free-text explanations for emotion prediction tasks with headlines. i presented our paper at the LREC-COLING 2024 conference.

Multi-Modal Emotion Prediction Towards Gun Violence News

  • We aim to develop methods for understanding how multimedia news exposure can affect people’s emotional responses, and we especially focus on news content related to gun violence, a very important yet polarizing issue in the U.S. We created the dataset NEmo+ by significantly extending the U.S. gun violence news-to-emotions dataset, BU-NEmo, from 320 to 1,297 news headline and lead image pairings and collecting 38,910 annotations in a large crowdsourcing experiment. In curating the NEmo+ dataset, we developed methods to identify news items that will trigger similar versus divergent emotional responses. For news items that trigger similar emotional responses, we compiled them into the NEmo+-Consensus dataset. We benchmark models on this dataset that predict a person’s dominant emotional response toward the target news item (single-label prediction). On the full NEmo+ dataset, containing news items that would lead to both differing and similar emotional responses, we also benchmark models for the novel task of predicting the distribution of evoked emotional responses in humans when presented with multi-modal news content. Our single-label and multi-label prediction models outperform baselines by large margins across several metrics. I presented our paper at the AACL 2022 conference.

CherryStems Music

  • Won the 3rd place in MIT Creative Arts with $1000 award & selected by Real Industry Accelerator.
  • Designed and implemented a creative process for users to choose a music template that has been pre-trained with our algorithm to create their own music in a team of 5.
  • Built the demo page with NodeJS and Bootstrap and deployed it on AWS EC2 with music data stored in S3.
  • Developed the iOS app that allows users to select from the loaded sounds and make custom music.

PalPack Travel Mobile App

  • Designed and built a mobile platform that matches users with travel plans and in charge of development on a team of 6.
  • Built the entire backend code base with Loopback and MySQL, including authentication token system, real-time friend request sending, and deployed it on AWS EC2.
  • Implemented the iOS frontend of account system, and fuzzy matching of a username when searching.

DoItNow Fitness Website

  • With React and Firebase, designed and built a mobile web social platform for people to show off workout progress by posting pictures and training they did and auto calculates calories spent. Deployed with Heroku.
  • Extended with a feature that allows people to send each other encouragement notes before their next training in the form of stories.

Facial Features Recognition

  • Detects and highlights key facial features given an image.
  • Designed and trained a CNN model with AlexNet as the basic structure on YouTube Faces Dataset using Pytorch.
Some results →

Image Captioning

  • Describe (auto caption) a given image.
  • Trained an end-to-end CNN-RNN (LSTM) model using Pytorch with the Microsoft COCO dataset as described in the paper Neural Image Caption Generator.
Some results →

Education

Boston University

PhD in Computer Science (GPA: 3.88)
2023 - present


Masters in Artificial Intelligence (GPA: 3.97)
2021 - 2023



Tufts University

Bachelor of Science
Computer Science & Mathematics
2015 - 2019

Teaching

Boston University

CS 542 - Machine Learning
Teaching Fellow
Spring 2024



Tufts University

COMP 15 - Data Structures
Head Teaching Assistant (leads lab)
Spring 2017 and Fall 2017