Я больше знаком с scikit-learn:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
df = pd.DataFrame([(3000, np.nan),
(4000, np.nan),
(5000, 10),
(6000, 20),
(6500, 33),
(7000, 44 ),
(8300, 60),
(9300, np.nan),
(9400, np.nan)], columns=['Index', 'Value'])
def extrapolate(df, X_col, y_col):
df_ = df[[X_col, y_col]].dropna()
return LinearRegression().fit(
df_[X_col].values.reshape(-1,1), df_[y_col]).predict(
df[X_col].values.reshape(-1,1))
df['Value_'] = extrapolate(df, 'Index', 'Value')
df
Вы должны получить что-то вроде этого:
Index Value Value_
0 3000 NaN -23.219022
1 4000 NaN -7.314802
2 5000 10.0 8.589417
3 6000 20.0 24.493637
4 6500 33.0 32.445747
5 7000 44.0 40.397857
6 8300 60.0 61.073342
7 9300 NaN 76.977562
8 9400 NaN 78.567984
# I assume you don't want to extrapolate the orginal values
df['Value'] = df['Value'].fillna(df['Value_'])
df
Дает:
Index Value Value_
0 3000 -23.219022 -23.219022
1 4000 -7.314802 -7.314802
2 5000 10.000000 8.589417
3 6000 20.000000 24.493637
4 6500 33.000000 32.445747
5 7000 44.000000 40.397857
6 8300 60.000000 61.073342
7 9300 76.977562 76.977562
8 9400 78.567984 78.567984
person
dokteurwho
schedule
11.08.2020