### Keras Leaderboard from Strata AI

Back in April 2019, I gave a presentation at Strata AI on a prototype API

I tend to focus a lot on how the data is wired, snipped, and prepared for models. I think the preprocessing steps of data lay the foundations by which all models are built and provides me a highly valuable exercise to understand the nooks and crannies of my dataset, especially when the data is new to me. I like focusing on the preprocessing step, also, as it teaches me about the quality of the data. And, that despite the sophistication of machine learning models: "Garbage in, Garbage out".

I've spent considerable time boiling the ocean to explore three specific methods for processing relational data for LSTMs. Each has some advantages and disadvantages. This blog post explores the first two methods. The third method which is using the `keras`

`timedistributed()`

wrapper consists of a very long n-length sequence of only one batch and is the method I demonstrated for the 2018 JupyterCon presentation. The Jupyter Notebook is here on my Github account here.

I normalized the data per below:

```
#for stateful LSTM, make pertussis the 1st feature in the column order
ts_stateful = ts[['pertussis', 'influenza', 'salm',
'Precip_inches','Low_Temp_F', 'High_Temp_F']]
#first convert to a 2d tensor of shape (timesteps, features)
ts_stateful_matrix = ts_stateful.values
#Normalize the whole matrix
mean = ts_stateful_matrix.mean(axis=0)
ts_stateful_matrix -= mean
std = ts_stateful_matrix.std(axis=0)
ts_stateful_matrix /= std
```

Essentially if we want to look back 3 weeks for one variable, we want to create several samples of overlapping sequences like the below.

My comments are in `"""`

quotes documenting what the code he proposes is doing:

```
def series_to_supervised(data, n_in = 1, n_out = 1, dropnan=True):
"""says that if it's just a list of data, then there's one variable; otherwise if it's like mine, a n x m matrix, then the number of variables or features is the shape[1]. Eg: `(544, 6`) is shape so shape[1] in this case is == 6"""
n_vars = 1 if type(data) is list else data.shape[1]
df = pd.DataFrame(data)
cols, names = list(), list() #empty lists
# input sequence (t-n, ... t-1)
"""n_in is your lag that will be called in the get_sequence() function later on and is how many time steps you want your sequence to essentially "lag" or "look back". Eg. My data is in weeks. So I used a 3-week time window and set n_in = 3. The default for his function is 1. When you run the below code subbing in 3, it prints out 3 then 2 then 1. The function below is saying for i starting at 3 counting down to 1, append the column by shifting it out by 3, then 2, then 1. """
for i in range (n_in, 0, -1):
cols.append(df.shift(i))
"""This is creating a new column header 'var1(t-3)', 'var2(t-3)', 'var3(t-3)' ... 'var6(t-3)' and so on until your nth var. That's in my case, 6th var. It may not be clear now but this is the important quality of this function that allows the subsequent records generated to retain sequences. This is very similar to the "bstriner" method, in that it provides overlapping sequences, as we'll see later below."""
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
"""This is shifting out the next record in your sequence by the n_out timesteps. In my case I wanted to shift it out only 1 timestep which is the default setting, anyway. This results in the last column of the dataframe as 'var1(t)' where `var1` is my target of interest, and i'm only considering the next time step of one. If (the "else" statement) n_out was 2 timesteps, then we would end up getting an additional column `var1(t+1)`. If n_out ==3, then `var1(t)`, `var1(t+1)`, `var1(t+2)`. """
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
"""The rest is self-explanatory. The function concatenates all the columns created together into one dataframe and drops the NaN's"""
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
```

Now he moves on to generate the sequence maker. it's important to understand how the input variables in the sequence generator function:

`def get_sequence(data, n_past, nth_var=1, n_forecast=1, dropnan=True)`

map to the function

`def series_to_supervised(data, n_in = 1, n_out = 1, dropnan=True)`

, like so:

An additional input into `def get_sequence()`

is `nth_var`

which is handy for the user to define which variable in the data is the target of interest. For me, it was the first variable in my 6-dimensional feature set.

```
def get_sequence(data, n_past, nth_var=1, n_forecast=1, dropnan=True):
seq = series_to_supervised(data, n_past, n_forecast, dropnan)
# drop columns we don't want to predict
"""This is a clean up process, essentially. It drops the current timestep for var2, var3, var4, var5, and var6, keeping only var1 which is my target I want to forecast."""
col_index = range(seq.shape[1]-data.shape[1],seq.shape[1])
del col_index[nth_var-1]
seq.drop(seq.columns[col_index], axis=1, inplace=True)
return seq
```

**Understanding the Sequence Results**

It's important to understand that the sequence generated preserves the patterns in the time series. This is the output csv of the sequence just generated for 6 variables with 3 timesteps of lookback and 1 time step of look forward specifically for just one variable, `var1(t)`

.

You'll notice that the target variable is included in the sequence. While in supervised machine learning use cases, you'd consider this as "target leakage" in this case, we need it as a part of our weekly sequence. The model learns on the past trends of all the variables including our target.

What I've done here is hide the other variables 2 through 6, so that you could see how the sequences overlap to capture three timesteps of lookback.

Explaining it here like so. Let's pretend today is August 29, 2018 and that four weeks ago it was the week of August 1st. You can see that on 8/1, `t`

this week was a `0.01010101`

. The next week on 8/7, you'll see that 8/1 then became the week prior and the sequence reflects that at `t-1`

, and so on. By the time we get to 8/29, `t-3`

is no longer in our current sequence. This week, 8/29's, value is now `t`

== 0.02020202.

You can see how this is similar to the "BStriner" Method proposed above:

I implemented this method in the Jupyter Notebook is here on my Github account here.

```
lookback = 32
step = 4
delay = 1
batch_size = 32
def generator(data, lookback, delay, min_index, max_index,
shuffle=False, batch_size=batch_size, step=step):
if max_index is None:
max_index = len(data) - delay - 1
i = min_index + lookback
while 1:
if shuffle:
rows = np.random.randint(
min_index + lookback, max_index, size=batch_size)
else:
if i + batch_size >= max_index:
i = min_index + lookback
rows = np.arange(i, min(i + batch_size, max_index))
i += len(rows)
samples = np.zeros((len(rows),
lookback // step,
data.shape[-1]))
targets = np.zeros((len(rows),))
for j, row in enumerate(rows):
indices = range(rows[j] - lookback, rows[j], step)
samples[j] = data[indices]
"""Note the [1] which indicates that the target is the second variable in this feature set. We need to change this to [0] since its our first."""
targets[j] = data[rows[j] + delay][1]
yield samples, targets
```

We then fit the generators:

```
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
#model.fit_generator means Fits the model on data yielded batch-by-batch by a Python generator.
model = Sequential()
model.add(layers.Flatten(input_shape=(lookback // step,
ts_matrix.shape[-1])))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
steps_per_epoch=30,
epochs=30,
validation_data=val_gen,
validation_steps=val_steps)
```

Get the latest posts delivered right to your inbox