Error messages with SARIMAX 'Too few observations...' and graphing a forecast values

I have the following code written in pycharm. I am trying to forecast organic search revenue out two years. When the code runs I get the following three errors:

  1. ‘Too few observations to estimate starting parameters%s.’
  2. This problem is unconstrained
  3. marker is redundantly defined by the ‘marker’ keyword argument and the fmt string “bo” (-> marker=‘o’). The keyword argument will take precedence.

I haven’t specified any marker type

I’d like to know if there is a way to clean these up but my biggest issue is my actuals + forecast graph looks right where the forecasted values are all the way to the right in the 1970s and the actuals are 2020 (seemingly correct).

from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.metrics import median_absolute_error, mean_squared_log_error

organic_search = pd.read_csv(r"C:\Users\am\Desktop\analysis\Organic_Search_Revenue_201801-202107.csv")

organic_search = organic_search[[‘Week_Start_Date’, ‘Amount’]]

organic_search[‘Week_Start_Date’] = pd.to_datetime(organic_search[‘Week_Start_Date’], format=’%Y-%m-%d’)
organic_search[‘Amount’] = organic_search[‘Amount’].astype(int)

organic_search[‘year’] = organic_search[‘Week_Start_Date’].dt.year

organic_search[‘week-year’] = [f"{Week} {year}" for Week,year in zip(organic_search.Week,organic_search.year)]

organic_search.set_index = organic_search[‘Week_Start_Date’]

#Visualize the total organic search revenue since 2018 by week
fig, ax=plt.subplots(1,1,figsize=[10, 5]) # Set dimensions for figure
plt.plot(organic_search.groupby(‘Week_Start_Date’).Amount.sum(), color=‘g’)
plt.title(‘Organic Search Revenue Time Series’)
fmt = ‘${x:,.0f}’
tick = mtick.StrMethodFormatter(fmt)

x = organic_search[‘Amount’]

Augmented Dickey-Fuller test

ad_fuller_result = adfuller(x)
print(f’ADF Statistic: {ad_fuller_result[0]}’)
print(f’p-value: {ad_fuller_result[1]}’)


best_model = SARIMAX(x, order=(2, 0, 2), seasonal_order=(2, 0, 2, 52)).fit(dis=-1)

#Forecasting 2 years steps ahead
forecast_values = best_model.get_forecast(steps = 104)

#Confidence intervals of the forecasted values
forecast_ci = forecast_values.conf_int()

#Plot the data
ax = organic_search.plot(x=‘Week_Start_Date’, y=‘Amount’, figsize = (12, 5), legend = True, color=‘g’)

#Plot the forecasted values
forecast_values.predicted_mean.plot(ax=ax, label=‘Forecasts’, figsize = (12, 5), grid=True)

#Plot the confidence intervals
forecast_ci.iloc[: , 0],
forecast_ci.iloc[: , 1], color=’#D3D3D3’, alpha = .5)
plt.title(‘Organic Search Revenue Forecast’, size = 16)
plt.ylabel(‘Revenue’, size=12)
plt.xlabel(‘Week’, size=12)
plt.legend(loc=‘upper center’, prop={‘size’: 12})’’’’’

Sample data is simply week and amount over 184 weeks.

2018-01-01, 10000.25
2018-08-01, 5500.41

and so on.

the correlogram looks like the data is lagged and I can’t remember how to correct for that. Q-Q plot looks ok until we get out to 2 sigma:

As mentioned, the actual plus forecast graph looks terrible and I really need help here as well as the model fit (I think I am off based on Q-Q and correlogram :woman_shrugging: :woman_shrugging:t5: )

I am using SARIMAX because there is a portion of organic search revenue that is highly volatile and has a halo effect on the total revenues. I am contemplating taking the log of these values and adding it as an inflation factor as an exogenous variable (not done yet).

Please help!