0.9 C
New York
Friday, February 3, 2023

R vs Python: What are the main differences? – TechRepublic

Register for your free TechRepublic membership or if you are already a member, sign in using your preferred method below.
We recently updated our Terms and Conditions for TechRepublic Premium. By clicking continue, you agree to these updated terms.
Invalid email/username and password combination supplied.
An email has been sent to you with instructions on how to reset your password.
By registering, you agree to the Terms of Use and acknowledge the data practices outlined in the Privacy Policy.
You will also receive a complimentary subscription to TechRepublic’s News and Special Offers newsletter and the Top Story of the Day newsletter. You may unsubscribe from these newsletters at any time.
Username must be unique. Password must be a minimum of 6 characters and have any 3 of the 4 items: a number (0 through 9), a special character (such as !, $, #, %), an uppercase character (A through Z) or a lowercase (a through z) character (no spaces).
R vs Python: What are the main differences?
Your email has been sent
More people will find their way to Python for data science workloads, but there’s a case to for making R and Python complementary, not competitive.

Image: iStock/monsitj

As data science becomes critical to every organization, it has become just as important to determine the right tools to help master it. The two most popular languages for tackling data science problems are Python and R. Both programming languages are open source with big communities. But, Python and R also bring their own unique strengths to data science, making it harder to decide which to use.
R is an open-source, interactive environment for doing statistical analysis. It’s not really a programming language at all, but it includes a programming language to help with analysis.
As outlined on the R project’s site, “R is an integrated suite of software facilities for data manipulation, calculation and graphical display [which] includes … a large, coherent, integrated collection of intermediate tools for data analysis … .” While not the first such tool, R was early to data science and has been a staple of academia for some time.
SEE: Hiring Kit: Python developer (TechRepublic Premium)
Python, by contrast, is an open-source, “interpreted, object-oriented, high-level programming language with dynamic semantics,” according to the project’s website. This doesn’t really do it justice, however. Python is an easy-to-learn, general-purpose language that is often the first language a developer will learn, as it has long been a teaching language.
“It’s easy to use, easy to pick up, kids use it, non-programmers pick it up in a weekend,” Anaconda CEO Peter Wang once related. “This is not accidental [but rather] has been a hardcore part of the design from the very beginning and quite intentional.”
As a close corollary, Python has also always been great as a glue language. As RedMonk analyst Rachel Stephens has stressed, “In that sense, it makes a lot of sense for enterprises to invest in Python as a way of investing in their established code.” Python, in other words, helps enterprises make legacy code part of their more recent aspirations to do data science.
This is perhaps where Python’s primary benefit for data science stands out: Everyone knows it.
“Python is the second best language for everything,” said Van Lindberg, general counsel for the Python Software Foundation. “R may be the best for stats, but Python is the second … and the second best for ML, web services, shell tools, and (insert use case here).”
Lindberg might be understating Python’s strength in some areas; it’s clearly not always second best, but his point is directionally correct: “If you want to do more than just stats, then Python’s breadth is an overwhelming win.”
In other words, Python is good enough that developers and others choose to use it for a wide array of use cases. Python, like Java, is a general-purpose programming language; however, unlike Java, it’s pretty easy to learn and to use. As such, it gets used for all sorts of things, leading to “explosive growth,” as Wang once described it. Small wonder, then, that if we analyze the relative growth and decline between Python and R in data scientist job postings, from 2019 through 2021, as Terence Shin has, then it’s clear that Python is gaining at R’s expense.
Though Python has proved more popular than R, that doesn’t mean it’s always better. As with most things in technology, it depends on what you’re hoping to accomplish. Though Python has a lower bar to learning and becoming productive, and R’s non-standard approach can be cumbersome to learn, for some tasks, it pays to invest in learning R. And, of course, for some things, like data mining and basic data visualization, you’re probably fine choosing either.
What you choose, however, should flow from the problem you’re trying to tackle as well as the long-term investments you and your company plan to make.
For example, R is a better fit for statistical calculation and data visualization because R is purpose-built by statisticians for statistical and numerical analysis of large datasets. You don’t need to write much code in R to drive deep statistical analysis and data visualization.
It’s also the case that, for some areas like life sciences, the R packages might be particularly well-developed, making R a good choice. Much depends on what you’re building and your background. As Align BI partner Ryan Hobson said in an interview, “I think R is an easier language for statisticians who might not have a programming background.”
But it’s precisely that “programming background” that makes Python the clear winner for developers or others interested in big data, artificial intelligence (AI) and deep learning algorithms.
“Python had a broader scope [than R] from the beginning [with engineering and science] DNA baked into the Python core,” said Wang. It’s objectively true that Python is dramatically more popular, across a much wider array of use cases, than R, and becomes more so every day.
Then, there’s the reality that the very nature of data science is changing.
“There has also been an expansion beyond what was traditionally purely a data science team; for example, at Netflix, we have the role of Algorithms Product Manager,” noted Christine Doig, director of innovation for personalized experiences at Netflix. There’s more integration with the design team, with creative teams.”
That expansion of data science specialization argues for a wider variety of people helping with the data science workload, which in turn favors a language like Python that is more broadly used.
Hence, there’s a very real question as to whether it’s worth investing in R to solve a relatively narrow set of use cases versus Python, which allows an organization to meet a broad array of use cases. The answer might be yes, but you need to carefully consider.
Or perhaps you just need to wait. After all, the R and Python communities are both actively improving their relative capabilities, adding packages and libraries to deepen and extend their utility. In this area, however, the advantage goes to Python, both because of the relative size of its community, but also because of its glue code pedigree.
According to Wang, it’s very possible that rather than replace R for some use cases, “maybe someone will build a nice Python wrapper to expose a thin shim to expose some R capabilities.” In other words, it’s not hard to imagine Python embracing those native elements of R, so developers and data scientists don’t have to choose.
Both R and Python serve their respective constituencies well. Yes, the Python community is much bigger and is more likely to pull R packages into the Python ecosystem than the reverse, but which you’ll use may ultimately be a question of and, not or.
Disclosure: I work for MongoDB, but the views expressed herein are mine.
Our editors highlight the TechRepublic articles, downloads, and galleries that you cannot miss to stay current on the latest IT news, innovations, and tips.
R vs Python: What are the main differences?
Your email has been sent
Your message has been sent
TechRepublic Premium content helps you solve your toughest IT issues and jump-start your career or next project.
These 11 cloud-to-cloud solutions back up your organization’s data so you’ll be covered in the event of deletions, malware or outages. Compare the best online cloud backup services now.
You can use a mobile device to speak with another person directly through the Teams app. Lance Whitney shows you how to use this handy feature.
A phishing technique called Browser in the Browser (BITB) has emerged, and it’s already aiming at government entities, including Ukraine. Find out how to protect against this new threat.
With so many project management software options to choose from, it can seem daunting to find the right one for your projects or company. We’ve narrowed them down to these nine.
Start-ups, DARPA and Accenture Ventures announce research partnerships, new hardware and strategic investments.
IIoT software assists manufacturers and other industrial operations with configuring, managing and monitoring connected devices. A good IoT solution requires capabilities ranging from designing and delivering connected products to collecting and analyzing system data once in the field. Each IIoT use case has its own diverse set of requirements, but there are key capabilities and …
Recruiting an Operations Research Analyst with the right combination of technical expertise and experience will require a comprehensive screening process. This Hiring Kit provides an adjustable framework your business can use to find, recruit and ultimately hire the right person for the job.This hiring kit from TechRepublic Premium includes a job description, sample interview questions …
The digital transformation required by implementing the industrial Internet of Things (IIoT) is a radical change from business as usual. This quick glossary of 30 terms and concepts relating to IIoT will help you get a handle on what IIoT is and what it can do for your business.. From the glossary’s introduction: While the …
Procuring software packages for an organization is a complicated process that involves more than just technological knowledge. There are financial and support aspects to consider, proof of concepts to evaluate and vendor negotiations to handle. Navigating through the details of an RFP alone can be challenging, so use TechRepublic Premium’s Software Procurement Policy to establish …


Related Articles


Please enter your comment!
Please enter your name here

Latest Articles