N
Neel Shah
Open Source & Projects

Built & Shipped

Open source libraries, research tools, and datasets — all publicly available.

Open Source Library

Flagship Project

😊

emot

Python · MIT License · 2018

A Python library for detecting and extracting emojis and emoticons from text. Built to solve one problem well — and grew organically to over 3 million downloads globally. Used by engineers and researchers for NLP preprocessing, sentiment analysis, and social media analytics.

3M+ downloads
196 stars
365+ dependents
79 forks
$ pip install emot
import emot
obj = emot.core.emot()
text = "I love python ☮ 🙂 ❤"
result = obj.emoji(text)
# detects, locates & names emojis
# 3M+ engineers worldwide use this
Research Tools & Datasets

Data & AI Projects

📊

Arxiv Data Analysis

Python · Jupyter · 2017

Comprehensive analysis of 24,000+ research papers from arXiv, mapping the emergence and evolution of machine learning, deep learning, and AI topics over time. Full dataset and code included.

Python Data Analysis AI Research NLP
View on GitHub ↗ · ⭐ 4 · 2 forks
🔬

Scopus India Analysis

Dataset · Python · Pinned

Data analysis of Indian researchers published in Scopus journals (2000–2016). Explores publication trends, collaborative networks, and research output patterns. Full dataset publicly available.

Research Data Scopus Open Dataset
View on GitHub ↗
📁

arxivData

Dataset · Python

A curated dataset of arXiv papers spanning AI, ML, deep learning, computer vision, and neural networks. Designed as a searchable offline research corpus.

AIMLDeep LearningComputer VisionOpen Dataset
View on GitHub ↗

Optimize Python

Python · Guide

A practical guide and code collection for writing high-performance Python — covering profiling, multiprocessing, memory management, and efficient data structures.

Python Performance Best Practices
View on GitHub ↗
Coming Soon

In Planning

🗄️
Open Data Platform

Aggregating curated open datasets — health, NLP, social media — for researchers and engineers.

In planning · Data Platform
🤖
AI-Ready Datasets

Publishing clean, labelled datasets specifically designed for LLM fine-tuning and RAG applications.

In planning · LLM Data