Built & Shipped
Open source libraries, research tools, and datasets — all publicly available.
Flagship Project
emot
A Python library for detecting and extracting emojis and emoticons from text. Built to solve one problem well — and grew organically to over 3 million downloads globally. Used by engineers and researchers for NLP preprocessing, sentiment analysis, and social media analytics.
Data & AI Projects
Arxiv Data Analysis
Comprehensive analysis of 24,000+ research papers from arXiv, mapping the emergence and evolution of machine learning, deep learning, and AI topics over time. Full dataset and code included.
Scopus India Analysis
Data analysis of Indian researchers published in Scopus journals (2000–2016). Explores publication trends, collaborative networks, and research output patterns. Full dataset publicly available.
arxivData
A curated dataset of arXiv papers spanning AI, ML, deep learning, computer vision, and neural networks. Designed as a searchable offline research corpus.
Optimize Python
A practical guide and code collection for writing high-performance Python — covering profiling, multiprocessing, memory management, and efficient data structures.
In Planning
Aggregating curated open datasets — health, NLP, social media — for researchers and engineers.
Publishing clean, labelled datasets specifically designed for LLM fine-tuning and RAG applications.