Leave Neuroscience to become a data scientist?
posted on January 5, 2016


A bunch of helpful links

Some very helpful thoughts by Jeong-Yoon Lee who very successfully made the transition

First, I’d like to say that data science is a relatively new field (like computational neuroscience), and you don’t need to feel bad to make the transition after your Ph.D.

When I was out to the job market in May 2011, I didn’t have any analytic background at all either. I started my industrial career at one of analytic consulting companies, Opera Solutions in San Diego, where one of Nicolas’ friends, Jacob, runs the R&D team of the company. Jacob did his Ph.D under the supervision of Prof. Michael Arbib at USC in computational neuroscience as well. During the interview, I was tested to prove my thought process, basic knowledges in statistics and machine learning, and programming, which I’d practiced though out my whole Ph.D.

So, if he/she has good machine learning background with programming skills (I’m sure he/she does based on the fact he/she’s your student), he/she can be competent to pursue his/her career in data science.

Useful tools in data science

Back in the graduate school, I used MATLAB a lot and some SPSS and very rarely C. In the data science field, Python and R are most popular languages, and SQL is a kind of necessary evil.

R is similar to MATLAB except that it’s free. It is not a hardcore programming language and doesn’t take much time to learn. It comes with the latest statistical libraries and provides powerful plotting functions.

There are many IDEs which make easy to use R, but my favorite is R Studio. If you run R on the server with R Studio Server, you can access it from anywhere via your web browser, which is really cool.

Although native R plotting functions are excellent by themselves, the ggplot2 library provides more eye-catching visualization.

In Python, Numpy + Scipy packages provides similar vector-matrix computation functionalities as MATLAB. For machine learning algorithms, you need Scikit-Learn, and for data handling, Pandas will make your life easy. For debugging and prototyping, iPython Notebook is really handy and useful.

SQL is an old technology but still widely used. Most of data are stored in the data warehouse, which can be accessed only via SQL or SQL equivalents (Oracle, Teradata, Netezza, etc.). Postgres and MySQL are

Some hints for ML competitions

Fortunately, I had a chance to work with many of top competitors such as the 1st and 2nd place teams at Netflix competitions, and learn how they do at competitions. Here are some tips I found helpful.

see more at jeongyoonlee