I am an applied research scientist interested in challenging problems that can have an impact. My primary research areas include applied natural language processing and artificial intelligence.
The following sections should help you understand my work and my experiences.
Welcome to my world.
Most of the code and documentation is available on my GitHub.
- Convert LaBSE from TensorFlow to PyTorch: I migrated the LaBSE model’s TensorFlow checkpoints to PyTorch (which I use more often and prefer) and uploaded to the HuggingFace Model Hub.
- The Seen and The Unseen - Bookshelf: One of my favorite podcasts has a bunch of book recommendations in the show notes. I built a tiny aggregator that displays all of those recommendations in a single place. (Source.)
- Apple’s COVID-19 Mobility Data for India: I spent a couple days looking at and playing with the mobility data Apple decided to start publishing. I thought it had some interesting trends (specifically for different cities) and I pondered over what was causing some of the specific changes. (Source.)
- Hugo Theme - Overflow Identity: I built the theme for my personal website (this!), which was a mashup of 2 HTML5UP themes from a few years ago. (Another example.)
I like to take inspiration from FOSS projects, and contribute when it makes sense. If I debug issues when I run into them, I also like to fix them.
All of my PRs and issues (on public repos) are listed here, but some of the contributions are highlighted below.
- I fixed an error with the language models' loss calculation. This affected generative NLP models like GPT-2.
- I added support for dropping the last incomplete batch from the dataloader for training any transformer-based model. (And on TensorFlow.)
- And then I added support for dropping same on TensorFlow datasets Use dataloader_drop_last in TF dataset:
- I added Okta OAuth support.
- I added ability to export timestamps from annotations.
- And I fixed breaking installs.
- I added support for using
poetry as a packaging tool in Lambda functions.
- I consolidated (and refactored) previously written tests for Python Lambda functions.
- I added options to retry, extend timeout for SageMaker Batch Transform jobs.
- I added support for providing a custom bundling Docker image for Python Lambda functions.
I added feedstocks (conda-forge packages for the source Python packages) for:
On top of the ones I added, I also help maintain the feedstocks for:
- Integrating preventive care guidelines & EHR to provide better healthcare: My MS thesis is about improving preventive healthcare recommendations by using natural language processing. Through this project, I have succeeded in providing personalized preventive care recommendations to patients by analyzing patient EHR data and USPSTF Preventive Care guidelines.
- Prediction model for water demand in Central Indiana for Citizens Energy Group: I designed a parallel RNN algorithm to predict daily and monthly average water demand with a very high accuracy. My model achieved an average error rate of 1.69% for daily predictions and 2.29% for monthly predictions.
- Disease-based biomedical document search and retrieval using Word2Vec: I developed an algorithm that uses disease ontology for biomedical document search and retrieval. With an innovative concept weighing scheme for biomedical documents, I have overcome the problem of semantically equivalent biomedical concepts being represented using heterogeneous lexicons.
- Navigation tool to compute the best route based on road safety: This project is based on artificial neural networks and uses information of past fatal accidents that have occurred in USA to predict future accidents and compare various route options from location A to location B.
- A Home Automation and Internet of Things Solution for Indian Homes: In this project, a home automation system focusing on solving specific Indian home problems (automated passageway and room lights, keyless door lock and LPG cooking gas leakage detection and ordering system) was created.
- Android app to log GPS and Accelerometer data to local storage and server: An Android app that periodically collects data from the GPS and accelerometer sensors and stores it on a local buffer and if enabled, a web server.
- Android navigation app that computes the safest route for travel: An extension of the previous navigation tool app that computes the safest route for travel based on time of journey, fatality rate prediction and weather conditions.
- Simulation of various Ad-Hoc Routing Protocols using NS-3: Simulated various Mobile Ad-Hoc Networks for routing protocols like AODV, DSDV, DSR, OLSR, GPSR and Bird Flocking Routing Algorithm (BFA) using Network Simulator-3.
- Smart Socket: A 3-pin socket that doesn’t enable electricity supply to the appliance connected until the plug is inserted entirely, thus helping in preventing short circuits, excessive draw current and electric shocks for the user and provides protection from surge voltage, under-voltage and ground leakage protection.
- Sanjay Shah Seminar website: A couple of years back, when dad asked me if I would develop his website a few years ago, I took up the challenge. While I do not maintain it anymore, I created it in early 2011 and maintained it through the first half of 2016.
- Bhavin Shah’s website: I also help a dear friend, my mentor (and now a published author!) with creating and maintaining his website.
- Not Just The Talks: I realized I wasn’t fine with the way things were in my country and in the society around me. This was my attempt at making a difference through my writing. It has been quite sometime since I last wrote on there.
The following includes my research that has been published as my thesis and in peer-reviewed conferences and journals.
- S. Shah, “Biomedical concept association and clustering using word embeddings,” Master’s thesis, Purdue School of Engineering and Technology, IUPUI, 2018. Available through Purdue Hammer and IUPUI ScholarWorks.
- S. Shah, Z. Ben Miled, R. Schaefer and S. Berube, “Differential Learning for Outliers: A Case Study of Water Demand Prediction,” in Appplied Sciences vol. 8, no. 11, 2018. Available through MDPI.
- X. Luo and S. Shah, “Concept embedding-based weighting scheme for biomedical text clustering and visualization,” in Appplied Informatics vol. 5, no. 1, 2018. Available through Springer.
- S. Shah, X. Luo, S. Kanakasabai, R. Tuason, and G. Klopper, “Neural Networks for Mining the Associations between Diseases and Symptoms in Clinical Notes,” in Health Information Science and Systems vol. 7, no. 1, 2018. Available through Springer.
- S. Shah, M. Hosseini, Z. Ben Miled, R. Schafer and S. Berube, “A water demand prediction model for Central Indiana,” in Proceedings of the Thirtieth Conference on Innovative Applications of Artificial Intelligence (IAAI ’18), New Orleans, USA, 2018. Available through AAAI Publications.
- S. Shah and X. Luo, “Comparison of Deep Learning based Concept Representations for Biomedical Document Clustering,” in Proceedings of 2018 IEEE International Conference on Biomedical and Health Informatics (BHI ’18), Las Vegas, USA, 2018. Available on IEEEXplore.
- S. Shah and X. Luo, “Exploring diseases based biomedical document clustering and visualization using self-organizing maps,” in Proceedings of the 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China, 2017. Available on IEEEXplore.
- S. Shah and X. Luo, “Extracting Modifiable Risk Factors from Narrative Preventive Healthcare Guidelines for EHR Integration,” in Proceedings of the 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE ’17), Washington DC, USA, 2017. Available on IEEEXplore.
- X. Luo, G. Zimet and S. Shah “A Natural Language Processing Framework to analyse the opinions on HPV Vaccination Reflected in Twitter over 10 Years (2008 - 2017),” in Human Vaccines & Immunotherapeutics vol. 15, no. 8, 2019. Available through Taylor & Francis.
- I. Terziyska, S. Shah and X. Luo, “Are Recent Terrorism Trends Reflected in Social Media?” in Proceedings of the 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), 4th National Workshop for REU Research in Networking and Systems, Orlando, USA, 2017. Available on IEEEXplore.
Except having the honor of publications, I have also been featured in a few university briefs.
While my real life handwriting is often described as scribble, I like to believe these are more legible. Over the course of the years, I have written quite a lot. Read it. I hope you find something you like.
- (micro)blog: I recently started posting short snippets of things I want to say out here. This fills the space between Twitter and my blog. It also serves as a place for me to post things I would’ve earlier posted on Tumblr, I Talk Tech or Unwind blogs. More often than not, this is where you’ll find my most recent posts.
- Blog: That blog where I talk about everything from my current affairs to politics to human behavior.
- Not Just The Talks: My posts on an (now defunct) initiative close to my heart.