Students new to data science may wonder whether they should use R or Python for their data analysis tasks. Both are popular languages for statistics, but there are key differences.
While R was developed specifically for statisticians and is great for data analysis, Python is an easy-to-learn, easy-to-read language that spans disciplines and is useful to integrate data analysis tasks with web apps or to incorporate statistics code into a production database. Python is used to develop video players like YouTube, power apps like Instagram, test microchips at Intel, run a search engine at Google, and power transactions on the New York Stock Exchange (NYSE). The language greatly resembles English, making it intuitive to use and accessible to just about anyone.
Extensive Code Libraries
Python was created by Guido van Rossum and launched in 1991. (Fun fact: van Rossum, who was reading the published scripts from “Monty Python's Flying Circus” while working on the language, called it Python and the name stuck.)
More than 25 years later, a lot of code has been created, and because it’s open source, a massive amount has been collected in libraries for other developers to find and use. You can download and install various libraries, import them into your scripts, and get to coding with the benefit of decades of established, tested code knowledge at your fingertips.
Most code is collected on PyPi, which is pronounced “pie-pee-eye,” though insiders will commonly call it “the Cheese Shop,” an homage to a famous Monty Python skit. Extensive repositories are available on Github, too. For those interested in data science, Pandas, the Python Data Analysis Library, will help with everything from importing data to data cleanup to processing time-series data, and more. There are other libraries for scientific data (NumPy/SciPy), statistical analysis (Statsmodels), and machine learning (Scilkit-Learn and PyBrain), to name a few.
Imagine reading a program script aloud, and having it sound like you’re speaking English. With Python, which used words like ‘not’ and ‘in’ and has very strict punctuation rules, the language of programming no longer feels so foreign.
In addition, Python’s set of formatting rules (PEP 8) ensures that you always know where to put your lines of code—and that any code written by another developer will look pretty similar and be easy for you to read. That’s true even for the most advanced or beginner Python developer.
Python has an active community of users, too, so you can always find other people to learn from and connect with. Worldwide, there are 1,637 Python user groups, usually called PUGs, with membership of 860,333 people. Python User Group Baltimore boasts more than 1,100 members with monthly meetups and a bimonthly social meetup to work on independent projects along side one another or for groups to collaborate.
Python also holds major conferences on nearly every continent, including North America, and other industry conferences like the Open Data Science Conference (ODSC) offer Python training as well.
Loyola’s Data Science Master’s students can begin an introduction to programming and Python in the online course Computer Science 1 (CS 151), which teaches students how to problem-solve using the programming language Python. Students learn how to read data into a program, use logic to control which lines of code are executed, divide complex problems into small, manageable pieces, and use this knowledge to answer interesting questions about the data.
The first computer science course in the master’s program, Programming for Data Science (CS 703), teaches students how to use Pandas and Numpy in Jupyter Notebooks, the development environment of a data scientist. The class teaches how to collect data, clean it up, and visualize it to be ready for sophisticated statistical analysis and machine learning.
Following this, students take Machine Learning (CS 737) and use Scikit-learn to train a machine-learning algorithm to make predictions on new data. Students learn about the pros and cons of the algorithms in an environment where it’s easy to switch one learning algorithm for another, and wrap up the course with a producing using a data set of their choice.
By the end of the program, students can appreciate how powerful a tool the Python programming language is, and be ready to enter the workforce knowing how to analyze data and make predictions based on it.