Table of Contents
what is data science and data analysis for beginners?
what is data science and data analysis for beginners? Data science is a hot topic of study in 2021. It lies at the intersection of hacking skills, statistical knowledge and substantive expertise. These three domains are very interesting. Specially the third one as it requires a lot of detail for its description. The hacking skills is the beginning of data science and mathematics/ statistical knowledge is the core of it.
Different tools can be used for hacking and once you know how to use these tools you can better understand how to handle data problems. The mathematically skills can be improved by learning probability, linear algebra and statistics.
Data Science from scratch:
There are lots of tools, libraries, toolkits, modules and frameworks available that efficiently implement the algorithms for data sciences. Programming language choice is also important. Different programming languages are: Java, R-language, Scala or Python. Right now mostly data scientists prefer to program in Python. Python has so many features that make it easy to use, its libraries has built in codes for data science, it is freely available.
We are moving in 2022 and in this digital era it can be seen clearly that the demand for data scientist is increasing day by day. In short it can be said it is the hottest job right now. A software developer is earning between $110,000 to $135,000 while a data scientist can earn between $120,000 to 180,000 per year. But their salary So from a technical point of view you should also have the idea of classification problems and know how to perform principal component analysis (PCA) in machine learning.
The role of a data scientist is to develop and analyze the data so that it can be interpreted in meaningful ways. The insights obtained after analysis can be used for many purposes. Specially in making business decisions. Other applications include:
- Data mining
- Business Intelligence
- Machine Learning
- Analysis of social media
- Web analytics
So, basically data sciences is the branch of computer science and its main focus is data analysis. Software skills are necessary to become a data scientist. The better you have software skills, the more appropriately you can write programs and test you your data.
Difference between data analyst and data engineer
Data scientist analyze and model the complex data of world. They work more in terms of theoretical terms. For example, writing machine learning codes. Data engineers are usually associated to business word. They work on large data and help companies to build and analyze models. They build databases and help people to understand the data. Data engineers do not require specifically to come from a computer science background. All they need to have the technical knowledge-an expert in physics and mathematics.
Role of data scientist
Data scientists start working on the raw data collected from different sources. They sources can be a company, a hospital, college students data. The goal of data scientist is to use statistical or machine learning tools and for analyzing the data. They generate a hypothesis and then validate that hypothesis. Also they need to ensure the security and privacy of the data and providing quality solution to the customers.
Technical skill to become a data scientist
Following are the aspects that you must consider if you want to become a good data scientist.
- In which domain you want to work?
- Model you system from the given data
- Build upon existing data sets
- Record your success
Issues of ethics, Bias, and Privacy in Data Science
Ok, this thing is of great concern as it may be problematic later on!!! It is the ethical responsibility of the data scientist to know from where the data was collected? Who collected the data? What was the sole purpose of that collection? If the data is accessed through social media like Facebook or twitter then was the consent of people taken or not? Because something available online does not imply to be used freely. These are all about ethics. One example of such privacy breaking arose in April,2018 when some firm collected data from Facebook for political campaigning. People were unaware of the fact that their personal views were targeted for showing the relevant adds of politicians. The old saying “nothing is free” is so suitable for defining this. When you buy something for free then they actually buy you!.
There are many other examples in digital world where the information was leaked to third party intentionally or unintentionally. This leakage of data harmed many people.
In the following table an imaginary data set of auto insurance providers is shown. The table also describes the ratings of three current customers. If you have to chose an insurance company based on ratings, then which one you will opt?
Based on ratings , the first choice that comes in mind is GEICO. Because rest of the two companies have low ratings. But if you look at average ratings of GEICO it is equal to=7.4, Progressive company has an average rating value = 7.66, USAA=6.06. If you consider the average ratings then Progressive is good!.
also read here