Added explanation

This commit is contained in:
2025-04-23 08:14:39 -07:00
parent 800712c63c
commit 259e19f2ac

View File

@@ -15,7 +15,7 @@
The motivation for project is satisfying a class assignment. Eventually, an interesting (enough) topic was discovered in the data set:
What is the annual compensation (y) over years of experience (x) of deveopers who use a programming language from a specific country?
>What is the annual compensation (y) over years of experience (x) of deveopers who use a programming language from a specific country?
## Requirements
@@ -23,10 +23,71 @@ What is the annual compensation (y) over years of experience (x) of deveopers wh
## Summary of Analysis
*For the purpose of this analysis, (data) scientists and researchers are also considered developers.*
*(Data) scientists and researchers are also considered developers.*
The models generated by the notebook become less reliable for incomes greater than $200,000 per year and years of experience after 10.
The red regression line is a default linear regression model with no tuning or transformation. The second line given is an attempt to better fit the data.
### C
![graph of c programmers](images/programmers-C-United-States-of-America.png)
+----------------------+
red regression line for C
coefficient = [[1427.57628612]]
intercept= [103659.81688347]
rmse = 26971.435951866046
r2 score = 0.06338789005150514
sample predictions:
[[125073.46117519]
[107942.54574181]
[109370.12202793]]
+----------------------+
+----------------------+
magenta regression line for C
coefficient = [[11973.46915165]]
intercept= [54776.26654469]
rmse = 21198.612029169606
r2 score = 0.5719105246850147
sample predictions:
[[132396.26294684]
[119937.35465744]
[ 64985.1549115 ]]
+----------------------+
For C programmers, a linear model fits well but not great having an r2 score of 0.57. Entry and junior level positions earn roughly $54,776. This income progresses $11,973 as they progress into their careers.
However, there are more outliers among C programmrs (earning >$200,000 per year where the model becomes unreliable, especially early to middle career). Among the game developers, they may have created an indie game.
### Python
![graph of python programmers](images/programmers-Python-United-States-of-America.png)
+----------------------+
red regression line for Python
coefficient = [[2573.62288638]]
intercept= [123479.14829609]
rmse = 39759.451947012705
r2 score = 0.3374996531752672
sample predictions:
[[126052.77118246]
[174951.60602361]
[187819.7204555 ]]
+----------------------+
+----------------------+
cyan regression line for Python
coefficient = [[10378.53148381]]
intercept= [82957.69404785]
rmse = 42374.26457780401
r2 score = 0.3775348676837581
sample predictions:
[[139882.01866593]
[117229.55243376]
[137277.30441955]]
+----------------------+
For data scientists, analysts, or engineers, a linear model is a moderate fit at best as the r2 score is around 0.30. There appears to be divergence at the 10 year mark in their careers. This may be the result of their field (advertising, finance, bio/medical, and so on).
Entry or junior level professionals generally have an income of $82,957 or $123,479. Their annual income increases by $10,378 or $2573 each year.