Eugenie Y. Lai


Contact: eylai [at]
GitHub: ey-l
Twitter: @EugenieLai
CV, transcript

News [More Updates]
2021.04 Joining the Data Systems Group (DSG) at MIT EECS CSAIL as a PhD student in Fall '21.

Theme by orderedlist

Another Annotated Example: CS PhD Statement of Purpose

Date: 2021-04-22

This post is inspired by the Statement of Objective examples provided by the MIT EECS Communication Lab. Some programs (e.g., Berkeley EECS) require a Statement of Purpose (SoP) and a Personal Statement (PS). In this post, we will dissect and annotate my SoP submitted to MIT EECS, which is a hybrid of both, in my case.

I also shamelessly include a copy of my first draft for a before-after comparison and show how far I have (and maybe anyone could) come by applying the learnings discussed in this post. It would be difficult to measure the impact of something without showing the starting point, which is often missing in the existing resources for SoP.

Also, if you are an anxious applicant, let’s not compare ourselves. I know it’s easier said than done, and I still fall into that trap too. But it is unfair to compare the ins and outs of ourselves to only the best side of others (e.g., their SoP). This note was first brought up to me by Dr.* Maria De-Arteaga back in 2019 when I just started to pivot my profile towards grad school and has been helping me get off the overthinking treadmill since.

*Note: I heard Prof. and Dr. are distinct in the states, but we use Dr. for both in Canada. I didn’t know until the visit days and only used Dr. in my SoP. So let’s use Dr. in this post to keep everything consistent.

Before We Start

Intended audience: Future and current CS PhD applicants.

The role of SoP in grad admissions: Touched on by this Twitter thread, which could be specific to MIT EECS.

My result: I applied to 8 programs and was fortunate to get in almost everywhere, with 5 offers (i.e., Berkeley, MIT, UBC, UMichigan, and UWashington) and 3 withdrawals (i.e., Columbia, Maryland, and NYU).

A non-exhaustive list of caveats that may make this post not as applicable so readers’ discretion is advised:

Motivation to (uncomfortably) put myself out there:

Speaking of support, a list of direct help I received for my SoP:

So the first takeaway is clear: Be resourceful. Ask around. Keep an eye out for opportunities and resources, which shouldn’t take up much energy. Just have that running in the background.

Overall Thought Process

The grad application as a whole is supposed to show a 3D us to let the committee make a sensible decision. If we think backwards, SoP is only included for a reason. Like any member of a K-Pop group, it has a unique proposition in the package.

I wanted to show a 3D me by leveraging the application package with little overlaps between materials. But so far, the transcript and CV only put me into numbers and project names.

So something is missing – without showing my thought process and personality, I’m boring, cold, and flat. This gap is where SoP comes in, and it is the only* opportunity to add that third dimension. We will use sketching as an analogy and go through the things that I constantly reminded myself of when thinking about my SoP at a high level.

*Note: Letters of recommendation (LoRs) help too, but they are observations of us. SoP is the one thing in our full control.

Convey the why’s. Like all drawings, SoP needs a purpose, a main message that both utilizes the space in the application and fills the gap. From most of the resources above, the purpose should be conveying the why’s (e.g., why research, why grad school, why this subfield).

Find a common thread and tell a story about professional development. Now we know what to include, but how could we organize the content in a way that shows the reader how we think? We need a skeleton first. I could tell my why’s in a plain list. But wouldn’t some layered structure to show how my research journey evolved add more character? Inevitably, we have to repeat what’s in the CV, but the added value comes from the personality and thought process illustrated through those experiences. The experiences are just a tool at this point, instead of the main focus, so don’t worry about repeating the content.

Bring in personality. Let’s colour the black-and-white skeleton with a personal pallet. I tried to make every sentence read like something only I would write. Admittedly, bringing in a personal voice while staying professional is a fine line to walk, but it’s possible. The annotated SoP is (trying to be) an example. So is this post. We will talk about a few ways to do that in the detailed comments.

Help the reader focus. We have limited paint. Be concise and precise. Every sentence is an opportunity to draw a line and should together portrait a clean image. We don’t want to waste any bits or distract the reader with random, extra lines so every word should have its place. I also thought hard about what to leave in and leave out. Although I was involved in many things throughout my undergraduate time, I only included experiences that are pertinent to my story.

Detailed Comments by Section

There is a lot to unpack. We will walk through my reasoning for each section at a detailed level, which can also be seen as a concrete embodiment of the high-level takeaways discussed above.

Statement of Objectives


We don’t want to be a boring person. Opening with questions grabs the reader’s attention better than the laundry list of who I am and what I do in my first draft. The questions plant seeds too, as we will see later. Opening with research interests directly is also interesting.

How can we propagate breakthroughs in the scientific community to the real world? With the explosion of big data, how can we help fields outside of computer science (CS) extract and leverage its value? Inspired by these questions, my current research focuses on facilitating user interaction with databases.

Elaborate on my current research interest with the techniques (in the method space) and a use case (in the problem space). The use case also hints at my why’s, coming soon.

Specifically, I apply visualization and machine learning techniques to alleviate the barriers between users and databases to help users access and make sense of data. By helping users better explore and understand the data they have collected, I hope to enable data-driven decision-making in a wide range of fields. It is with these broad goals in mind that I am applying to pursue a PhD.

Finding My Research Interests

This section shows two things, my technical competency and why’s. I described 3 research projects and used the reflection on the experience to answer the why’s.

+1 to the example provided by the MIT EECS Communication Lab, the formula I used to describe a research project, one line each: summary + clarification of terms if necessary + need of the work (e.g., gaps) + our contributions + outcomes + my specific input. We will see this formula twice later in this section.

With a focus on data provenance summarization, my research journey began under Dr. Rachel Pottinger at the UBC Data Management and Mining Lab. The provenance of a query over a database is a subset of the data in the database that contributed to the query answer. While comprehensive, query provenance consists of large volumes of data and hence is overwhelming for users to explore. We presented an approach to provenance exploration that builds on data summarization techniques and provides an interface to visualize the summary. This work led to the first two papers I co-authored, Summarizing Provenance of Aggregation Query Results in Relational Databases (ICDE’21) and Pastwatch: On the Usability of Provenance Data in Relational Databases (ICDE’20). My main contributions include identifying the limitations in the existing methods, implementing the existing and our summarization methods, and running the experiments.

We love the dark times. Dr. Brené Brown said vulnerability and hardships help people connect and build trust. Being rejected is my true experience, and I intentionally included that to make myself relatable to the reader. In this case, it also shows resilience and segues into my first why, why research but not industry. As you may have also noticed, this concept is used everywhere in this post too.

Our work experienced a few submissions. Although I felt discouraged at first, I learned to reflect and was encouraged by how much our work had improved after each round. I also enjoyed my experience in research more than the industry for the autonomy and ownership over my work.

But I didn’t want to just tell my why’s like a list. I envisioned a story structure inspired by The Secret Structure of Great Talks by Nancy Duarte. She introduced a shape at around 6:00. Applying that concept, I first established what is, what could be, and the gap here. Like the shape, we will see me traversing between what is and what could be in the rest of this section.

However, I had some burning questions regarding my research interests going forward. Although I was engaged by the technical aspects of solving open-ended problems, I wanted to find something that would really excite me – what is the thing that would get me out of bed every morning? And how could I find it?

Transition to my next project to show more technical competency while keeping the flow of the story.

My next project, Developing a Data-Driven Electric Vehicle (EV) Strategy in Surrey, BC, Canada, helped me answer those questions.

Another example of the formula above but in a slightly different order to make things flow better.

Working with another undergraduate student under the supervision of Dr. Raymond Ng, we set out to address the challenge of how the city of Surrey should place EV charging stations. Prior to our work, the approach to determine where to install an EV charging site was solely based on expert opinions, despite a large volume of data collected by the city of Surrey. To help city planners make strategic decisions informed by evidence, I developed a web application to give them a user-friendly way to explore and make sense of the data. I used interactive maps and graphs to visualize the spatial distribution and time trends of Surrey’s vehicle stock, traffic flows, and land use. In September 2019, the city used my tool to choose 20 charger locations for a Canadian federal funding proposal, and I was proud to co-present this work at the SIGKDD’20 Social Impact Session this summer.

Talking about our values is another good way to bring in our personality while staying professional, which also helps answer some why’s. For example, what kind of research keeps us excited? I’m excited about real-world users (in the problem space), but everyone is motivated differently. Maybe you are excited about system design? Cool! Or applying new ML models? Also cool! Note that this part also ties back to the opening questions.

Through zooming in and out on a pressing, real-world issue, I realized what I should be looking for in the research I pursue: the possibility of helping others and the insight into real-world issues that would spark that possibility. I started to envision making an impact on the real world through my research. The value of our work in the scientific community can only be actualized when our tools are adopted by downstream users such as domain experts and decision-makers. Hence alleviating user-database barriers is a vital step in advancing data-driven decision-making in a wide range of fields.

Transition to the 3rd and final project. Another piece of advice I got (for almost everything grad application related) is don’t tell, but show. Earlier I said that I’m motivated by real-world issues, and here I showed that I followed through my words with actions.

With that overarching goal in mind, I initiated a project to facilitate user interaction with databases by identifying the major stakeholders and their challenges when interacting with databases, and then mapped that to their needs.

Apply the formula again to describe the project.

Database users often interact with databases via SQL query sessions. From our analysis, users pose a variety of SQL queries in sequence with changes in SQL keywords and query fragments such as tables and attributes. However, the existing approaches only consider queries individually and make recommendations based on query similarity and popularity. We presented a new approach to recommend query information by learning from the sequential knowledge exploration patterns of historical users. We modelled our query recommendation problem as a query prediction task and used sequence-to-sequence models to predict the next query. Supervised by Dr. Pottinger, this work led to Sequence-Aware Query Recommendation Using Deep Learning, submitted to VLDB’21. As the lead researcher, I identified knowledge gaps in the existing work, defined and scoped the research problem, analyzed the workload data, implemented the deep learning models, ran the experiments, discussed the results, and wrote the paper.

Tie back to the motivation and answer why grad school to wrap up the story.

Seeing a connection between my work and the quantifiable impact gives me a rush of excitement that I am contributing to help those real-world users in need. Through this project, I found myself enjoying both scoping and solving open-ended problems and hope to further improve with additional formal training in graduate studies.

Equal Access in STEM

I added this section following the same MIT EECS Communication Lab example and used the previous formula to explain the project as well.

It may seem odd to risk the flow of a research-focused SoP and make us question if this section is even relevant. But MIT EECS doesn’t require PS, and I wanted to show what I care about and where I come from. This section is also intended to help the SoP stay professional when I touched on my personal background in the last section. Again, fine line to walk. Lastly, grad school to me is more than research. This section adds another dimension to my professional development and connects to my career pursuit in academia mentioned later.

My other goal in graduate school is to further my pursuit of advancing equal access to educational resources for students in marginalized groups. Besides mentoring young women in STEM throughout my undergraduate time, for the past year, I worked on the UBC CS Undergraduate Program Evaluation and Renewal project. In the process, I realized how my experience with data visualization and user interface design could help to improve equity in education. Degree planning is challenging and time-consuming since students have to envision their career path and go to individual course pages to ensure they meet prerequisites accordingly. First-generation college students are especially vulnerable as they lack adequate guidance from their immediate support system. To solve this problem, I designed an interactive directed graph to show the dependencies between courses, provide a holistic view of the CS program, and visualize potential academic trajectories at UBC CS. I was thrilled to present my work at the UBC Board of Governors Meeting in Spring 2020. I deployed the graphs to the UBC CS website this summer and am currently helping UBC Centre for Teaching, Learning and Technology adapt the graphs campus-wide. Participating in this project allowed me to advance equal access in a higher level of education and help as many students thrive as possible.

Future Work

This section aims to convince the reader that I know the strengths of the program, our interests align, and I’m valuable specifically to them. The first part outlines my overall research interests, while I gave specific examples about the program and PIs in the second part.

I chose to put my research statement here, not anywhere else. Up to this point, I’ve been signalling pieces about my motivation and research interests using the opening questions, projects, and my why story. The reader now has enough context and is ready for a punch.

All my experiences collectively shaped my research interests and motivated me to pursue graduate studies. Today, database systems provide a vital infrastructure to access high volumes of data in a variety of applications. Seeing the user-database barriers and the potential of data-driven decision-making in areas outside of CS (e.g., city planning and sustainability) incites my urge to build my work around the theme of facilitating user interaction with databases. With a deep understanding of the problem space and skills gained through solving problems in this space, I hope to continue this line of work by applying visualization and ML techniques to help database users access and make sense of data.

I find this part becomes more candid and compelling when I write it as if the PIs would actually read it (and mine really did). Also, it only becomes attractive when the interest goes both ways. I wanted to show how they could help me but also what unique skills I could offer.

MIT CSAIL’s past and current work indicates its members’ unique strengths on this topic. Specifically, I would be excited to work with Dr. Tim Kraska and Dr. Sam Madden. Dr. Kraska has made outstanding contributions to enabling data analytics for individuals outside of CS using ML-inspired techniques. The sequential features of query sessions discussed in his recent work, IDEBench (SIGMOD’20), are fundamental to my work on sequence-aware query recommendation, where we empirically analyzed the query sequences in two real-life workloads. Extending my work under his supervision would give me strong support in leveraging query session information using ML techniques. My research interests also greatly overlap with Dr. Madden’s work, such as Data Civilizer, on building end-to-end systems to facilitate domain experts with data exploration. I would be excited to work with Dr. Madden by bringing my skills and experience in applying ML techniques to SQL queries.

Where I See Myself

I wanted to address why I spent 6 years at UBC, which is relatively uncommon and often raises questions (e.g., if I can handle a rigorous course load). However, it was difficult to word my reason in a professional way at first. So I only briefly mentioned the personal aspect while elaborating on my work experience. Xuan pointed out the key is to relate personal struggles to professional development and helped me further emphasize the value of the experience and how it contributed to my goals in graduate studies.

As a first-generation college student from a low-income, single-parent family, working puts additional constraints on my course load yet is the most effective way to support myself. Although I spent six years on my undergraduate degree, I did two years of co-op at three different places in industry, non-profit, and academia. While studying full-time, I have also worked part-time in retail, administration, and teaching. Through these valuable experiences, I not only learned about the many real-world challenges that people face on the job, but also discovered research interests that would allow me to address some of those challenges.

Let’s not leave any loose ends and tie the two goals together to wrap up.

After graduate studies, I aim to pursue a career in academia, so that I can develop the research and tools to address these challenges and more. Furthering my education at MIT would bring me one step closer to my goal of advancing data-driven decision-making in a wide range of fields and improving equal access to educational resources for students like me in marginalized groups.

Other Takeaways

I also learned and applied these general/minor things.

Just start writing. It is an iterative process. The first draft is the hardest and almost guaranteed to suck, but it gets our brain going. It gets a lot easier once we gain the momentum and just have to make incremental changes.

Start early, which goes hand-in-hand with the last point. I wanted to leave ample time for that interactive process, finished my first draft in late August, and finalized it in the first week of December 2020. I feel grateful that I took the time to reflect on my why’s, which also came in handy later in the (quite intense) interview process in January 2021.

Don’t stress too much about tailoring the SoP to each program. Partial thanks to SIGMOD ‘20*, I had a general research direction when applying. The programs and labs I applied to may have nuances in their research interests and strengths, but my motivation, research interests, and skillsets didn’t need to change much. I only swapped out the second half of the future work section for each program. However, someone with a broader interest and a more diverse set of programs may want to customize the SoP more and have different answers for each why depending on the program.

*Note: More on my experience at SIGMOD ‘20.

Read each program’s prompts and formatting requirements carefully. The point above is about the content, while this one is about the format and separation of the content. Programs like Berkeley EECS require an SoP and a PS so the separation depends on the prompts. I include my final copy of SoP and PS to Berkeley to show how I did it with minimum additional effort, which also helps illustrate the point above. The formatting requirements all have slight differences (e.g., word limits, header, title) so just be aware.

Choose what feedback and advice to take in. Going back to the point of being resourceful, we may later find ourselves getting various or even conflicted advice from different sources, which can be confusing and overwhelming. My apologies if this post is making it worse. But I always ask two questions whenever I get advice from people:

Although some advice is generalizable, this sanity check is a reminder to further verify if the information is credible and applicable to me, especially when I get negative (but not necessarily constructive) feedback.

An extreme example is the words from my relatives and family friends when they laughed at my school list. It still hurt at the moment, but the rational me didn’t take their comments to heart because they’re not in CS, and they don’t know my profile. More than a filter to allow in helpful advice, the questions are also shields to protect us, much needed in such a sensitive time.

Last Words

Through the applying process, I had countless breakdowns moments where I felt that I had already tried everything, but my SoP just read shallow, and my writing would never be good enough.

But it’s because SoP is hard to write!! It not only demands writing techniques but also deep reflections of the why’s from our experiences. Although writing the SoP challenged me hard on both fronts, I’m glad that I took the time and saw it as an opportunity to grow: It reminded me that improving my writing is a never-ending process, and the reflection indeed made me question my life but also assured my decision to pursue graduate studies.

However, I do want to acknowledge that not everyone has the privilege to afford the time and energy. Further, if we consider our individual profile (e.g., GPA, LoRs) as a whole, pouring our limited resources into SoP alone may not be a strategic move. Nothing is perfect nor needs to be. So knowing when to say good enough is an important skill too (which is something I still need to work on).

Lastly, taking one step further, I find some of the takeaways transferable to other written pieces (e.g., papers), other forms of communication (e.g., presentations), or professional development in general.

Although I had much fun reflecting on my learning, I genuinely hope this post would be somewhat helpful to at least one other person on the planet, and very best of luck if you are applying soon!! <3

Back to blog