Dataset sources for DataScience Lovers

Nishant Kumar
2 min readApr 1, 2020

Hi Friends ! Data is the new oil flowing freely, continuously. We need to build a dam to store it and use it for modelling using ML algorithm. Here are some of the sources for Datasets :-

Data Mining

Kaggle Dataset:-
https://www.kaggle.com/
https://www.kaggle.com/mylesoneill/game-of-thrones

UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/index.php

UNICEF:-
https://data.unicef.org/

UNICEF’s open datasets published on the IATI Registry: http://www.iatiregistry.org/publisher/unicef has been extracted directly from UNICEF’s operating system (VISION) and
other data systems, and it reflects inputs made by individual UNICEF offices.

UCI Machine Learning Repository:-
https://archive.ics.uci.edu/ml/index.php

fivethirtyeight:-
https://data.fivethirtyeight.com/

WHO (World Health Organization) — Open data repository:-
https://www.who.int/gho/database/en/

Amazon dataset:-
https://aws.amazon.com/s3/

Google’s Datasets Search Engine:-google dataset search

NZ Dataset:-
https://catalogue.data.govt.nz/dataset

INDIAN Government Dataset:-
https://data.gov.in/

US Govt. Dataset:-
https://www.data.gov/

Europe Dataset:-
https://data.europa.eu/euodp/data/dataset

UK Dataset:-
https://www.opendatani.gov.uk/

Awesome Public Datasets from GitHub:-
https://github.com/awesomedata/awesome-public-datasets

Makeover Monday:-
https://www.makeovermonday.co.uk/data/

Reddit/r/datasets/:-
https://www.reddit.com/r/datasets/

Data is Plural:-
https://tinyletter.com/data-is-plural

Numerous Dataset List:-
https://paperswithcode.com/datasets

Some more links:-

1. Google Dataset Search — https://lnkd.in/eGR9BAey
𝟮. IBM Data Asset eXchange — https://lnkd.in/eKaWvF_K
𝟯. Nasdaq Data Link — https://lnkd.in/eaXmdhvi
𝟰. Data .gov US — https://data.gov/
𝟱. Earth Data (NASA) — https://lnkd.in/em3KyCRw
𝟲. AWS Open Data — https://lnkd.in/eDy45QFD
𝟳. FBI Crime Data Explorer — https://lnkd.in/egnvsxwb
𝟴. Data .gov UK — https://www.data.gov.uk/
𝟵. CERN Open Data Portal — https://opendata.cern.ch/
𝟭𝟬. Antarctic Datasets — https://lnkd.in/eDbVjQBv
𝟭𝟭. BFI film industry statistics — https://lnkd.in/e3NTRd2S
𝟭𝟮. NYC Taxi Trip Data — https://lnkd.in/ez9-VKhB
𝟭3. The Official Portal for European Data — https://lnkd.in/ei2AxvyJ
𝟭4. Health Data — https://healthdata.gov/
𝟭5. Centers For Disease Control And Prevention — https://lnkd.in/eUSFqhkq
𝟭6. FiveThirtyEight — https://lnkd.in/ePptSfu8
𝟭7. Datahub .io — https://lnkd.in/efeRzvp4
18. Global Health Observatory Data Repository — https://lnkd.in/e_BNrthm
𝟭9. Latin American Data Bank — https://lnkd.in/eJXcXSP2
𝟮0. IMDb Non-Commercial Datasets — https://lnkd.in/epS8jgUi

Play around with these data. To start with ML concepts, you can use Titanic Dataset for EDA and Linear regression. All the Best !

--

--