Time flew by really quick: the first phase of GSoC is already over! At the end of every phase, all participants and mentors need to complete a short evaluation. This short evaluation asks about the experience and the chlalenges the participant faced during the first phase of the evaluation. To wrap up the first phase, I decided to write a blog post to summarize my experience with more details.
What the Evaluation Asks
First, let me list the questions that were asked in the first evaluation, and go into detail what my responses was.
The questions in the evaluation could be partitioned into two sections: 1) Relationship with the Organization and 2) Relationship with the Mentors.
Relationship with the Organization (TensorFlow)
- Did you contribute to TensorFlow before February 2019?
- When did you first communicate with TensorFlow this year?
- Additional feedback
February was when the list of mentoring organizations was announced. I was interested in contributing to TensorFlow but did not make any pull requests (PR) because contributing to such a big project felt like a big task. Thus, after getting accepted to GSoC, I challenged myself to create pull requests before the coding period to familiarize myself with the process. As I have written about in my previous blog post, my first contribution was improving the documentation, and afterwards I became more involved in TensorFlow.
When the GSoC coding period began, I started having biweekly meetings via Google Meet with Paige Bailey, who is working as a product manager for TensorFlow and also the GSoC organizer for the TensorFlow organization. For the past month, she has been incredibly helpful, informing me new opportunities and resources.
Relationship with the Mentors (Oscar and Mandar)
- How often do you interact with your mentor(s)?
- How do you communicate with your mentor(s)?
- Rate the quality of interactions with your mentor(s).
- How responsive are your mentor(s)?
- Additional feedback
When I was accepted to Google Summer of Code, I got an email from Oscar and Mandar, the mentors for my project. Oscar has been a core developer of TF-Agents for almost two years, and Mandar has been both a participant and a mentor for Google Summer of Code. Since then, we have communicated through emails, GitHub pull requests, and meetings. The emails were for sudden questions that could be answered quickly, and the GitHub pull requests were for lengthy conversations about the structure of the code. To complement this, we also met through Google Meet every Friday. These meetings were concise: each one lasted at most 30 minutes. However, it was nice to have them every week to review my own progress and check in with them.
Throughout this past month, I have been amazed by the amount of support they gave to me. My progress has been fairly inconsistent last month: during the first half, I made some good progress, but during the second half, I had a few blockers that required assistance, and also some sudden personal matters beyond my control. In both situations, Oscar and Mandar were understanding and willing to help. They made it easy for me to ask any questions, which allowed me to remove any blockers for my project.
What I Accomplished
As you can infer from my evaluation above, my relationship with TensorFlow and my mentors have been amazing. Unfortunately, the progress of my project has been less than amazing, since I have fallen behind my proposed timeline. However, I still poured in a good amount of work, which I will summarize below.
I am probably least knowledgeable of all the contributors of TF-Agents. This means that I know very well when certain information is missing in the documentation. I put this into my advantage and started my contributions to TF-Agents by improving its documentation.
I still have quite a few improvements to make around the documentations! I hope that by the end of the summer, TF-Agents would be even easier to use.
Exploration by Random Network Distillation
In my initial proposal “Adding Curiosity to TF-Agents,” my goal for the summer was to implement Random Network Distillation (RND), Intrinsic Curiosity Module (ICM), and CTS (Context Tree Switching). All three algorithms are similar in that they use “intrinsic motivation” to guide exploration. This “intrinsic motivation” is represented by an additional reward known as “intrinsic reward” that is used to augment the environment’s “extrinsic reward.” These three algorithms all present a unique way of computing this “intrinsic reward,” each with their own rationale.
Among them, I decided to implement RND first because 1) the algorithm was proposed most recently, 2) the central idea is intuitive, and 3) it achieves state-of-the-art performance on Montezuma’s Revenge. One thing I did not pay too much attention about was the amount of “tricks” that was involved to make RND have such superior performance, which took more time than I had imagined. I am currently implementing RND on a fork of TF-Agents with the help of Oscar and Mandar.
I will soon write a new post analyzing the patterns of intrinsic rewards with TensorBoard. Stay tuned!
Since I have started watching the TF-Agents repository, I get notifications when a new issue or pull request is created. Most of them are beyond my spectrum of knowledge, but I have found a few places where I can contribute.
- PR #143: Wrap envs with multidiscrete action space into discrete action space
- PR #144: Allow passing gym_kwargs in suite_gym.load()
Although this is not a part of my Google Summer of Code project, it helps me understand the TF-Agents codebase more thoroughly, and most importantly, I enjoy it!
The end of first phase is equivalent to the start of the second phase! I will continue to post my progress during the second phase also. I expect my next post to have more pictures and plots :)