The best data science and machine learning articles. Written by data scientist for data scientist (and business people)
Yet another Python library for Data Analysis that You Should Know About — and no, I am not talking about Spark or DaskBig Data Analysis in Python is having its renaissance. It all started with NumPy, which is also one of the building blocks behind...
In-depth exploration of data collection processesSome of my most popular repositories on GitHub have been about data collection, either through web scraping or using an Application Programming Interface (API). My approach had always been to find a...
I work at a YC company that has a evolved an interesting internal Slack group of data scientists. It’s a private group, but recently it got some attention on Twitter and we figured it might help aspiring data scientists if we published a few of th...
What is SQLite?Learn about the SQLite database engine and how to install it on your computer.In this article we will be exploring the extremely prevalent database engine called SQLite. We will describe what it does, its main uses, and then explain...
NYSE and NYSE Technologies, its technology subsidiary, found that the continuing growth of stock market data, the demand for more analytics, and, thanks to regulators, lots more reporting, were too much for its existing database.NYSE Technologies ...
Apache Spark vs. Hadoop MapReduce — pros, cons, and when to use whichWhat is Apache Spark?The company founded by the creators of Spark — Databricks — summarizes its functionality best in their Gentle Intro to Apache Spark eBook (highly recommended...
by Monte Zweben & Syed Mahmood of Splice MachineApache Hadoop emerged on the IT scene in 2006 with the promise to provide organizations with the capability to store an unprecedented volume of data using commodity hardware. This promise not onl...
A critical step in starting any database project: relational vs. non-relational, CAP Theorem and [email protected] unsplash.comWhen you start a new enterprise database project, one of the most critical steps is choosing the right database. With th...
For those struggling to understand big data, there are three key concepts that can help: volume, velocity, and variety. These three vectors describe how big data is so very different from old school data management.Editor's note: This article was ...
By Phoebe Wong & Robert BennettTo be a real “full-stack” data scientist, or what many bloggers and employers call a “unicorn” you’ve to master every step of the data science process — all the way from storing your data, to putting your finishe...