On Building a Time Series Database

What's happening at home?

What's happening at home?

Enough with the purely random time series. Let’s dive into generating time series that are based on a model simulation (yes, still randomness, but with some recognizable parts). For example, let’s generate a set of time series based on fictive sensors in a fictive apartment building.

This apartment we are simulating has five different sensors:

  1. Temperature: in degrees Celsius
  2. Humidity: percentage humidity
  3. CO2: parts per million (ppm) of CO2 (rises with the presence of humans)
  4. Light: light strength in lux
  5. Motion: is motion is detected?

Modeling

A model is something that captures parameters of the simulated world, like hour of the day, day of the week, season, etc.

For example, it should not be too bright during the night. At least not at 3 o’clock in the night, maybe still at 11 o’clock in the evening. The sun rises earlier and sets later in the summer.

CO2 particles are produced by people breathing. Thus, there should be some correlation between the CO2 and the motion sensor.

Typically the inhabitants will go working during the day, but only on workdays, not in the weekends.

The following piece of Python code describes a model for the simulation.

#################################################################################
## Modeling the apartment
#################################################################################
class BaseModel:
    def get_context(self, date, existing_context):
        raise NotImplementedError


class SeasonalModel(BaseModel):
    def get_context(self, date, existing_context):
        month = date.month
        if month in [6, 7, 8]:
            base_temp, sunrise_hour, sunset_hour = 21.0, 6, 21
        elif month in [12, 1, 2]:
            base_temp, sunrise_hour, sunset_hour = 16.0, 8, 17
        else:
            base_temp, sunrise_hour, sunset_hour = 18.0, 7, 19
        return {
            "base_temp_c": base_temp,
            "sunrise": datetime.time(hour=sunrise_hour),
            "sunset": datetime.time(hour=sunset_hour),
        }


class DayTypeModel(BaseModel):
    def get_context(self, date, existing_context):
        return {"is_weekend": date.weekday() >= 5}


class WeatherModel(BaseModel):
    def get_context(self, date, existing_context):
        return {"cloudiness": np.random.uniform(0.1, 0.9)}


class OccupancyProfileModel(BaseModel):
    def get_context(self, date, existing_context):
        is_weekend = existing_context.get("is_weekend", False)
        if is_weekend:
            profile = {(0, 8): 0.95, (8, 23): 0.8}
        else:
            profile = {(0, 7): 0.95, (7, 9): 0.5, (9, 17): 0.1, (17, 23): 0.9}
        return {"occupancy_profile": profile}


class DailyContext:
    def __init__(self, current_date):
        self.date = current_date
        self.context = {}
        self.models = [
            SeasonalModel(),
            DayTypeModel(),
            WeatherModel(),
            OccupancyProfileModel(),
        ]
        self._run_models()

    def _run_models(self):
        for model in self.models:
            new_params = model.get_context(self.date, self.context)
            if any(key in self.context for key in new_params):
                raise ValueError(f"Key collision in {model.__class__.__name__}")
            self.context.update(new_params)


def _calculate_occupancy(ts, context):
    hour = ts.hour
    for time_range, prob in context["occupancy_profile"].items():
        if time_range[0] <= hour < time_range[1]:
            return np.random.rand() < prob
    return False


def _calculate_ambient_light(ts, context):
    return (
        1.0 - (0.9 * context["cloudiness"])
        if context["sunrise"] <= ts.time() < context["sunset"]
        else 0.0
    )

The simulation generates sensor measurements in chunks of one day. It builds up a DailyContext for each day. The context consists of all the models. A SeasonalModel to model a subset of weather parameters based on the season, a DayTypeModel to indicate special days (such as weekend), a WeatherModel to model the cloudiness range and a OccupancyProfileModel, which is based on the previous models, to model the probability of inhabitants being present in the apartment.

To generate the time series, we split up the requested time range, into chunks of one day. For each day, a new context is created.

Once we have a set of models in the context, we can generate values for the different sensors based on the models.

In the above code, there are two examples of deriving properties of the day under simulation based on the models.

The _calculate_occupancy function computes the probablitiy of somebody being present in the apartment based on OccupancyProfileModel and the hour of the day.

The computed values for occupancy and ambient_light (result of the _calculate_occupancy, respectively _calculate_ambient_light functions) is then used to compute a random value for a time series:

day_light_or_artificial_light_modifier = (
    300
    if occupancy and ambient_light < 0.2
    else np.random.uniform(10, 100)
)
light_lux = np.clip(
                (ambient_light * 500) + day_light_or_artificial_light_modifier + np.random.normal(0, 5),
                0,
                1000,
            )

motion = 1 if occupancy and np.random.rand() < 0.1 else 0,

Note these two lines:

300
if occupancy and ambient_light < 0.2
else np.random.uniform(10, 100)

If the apartment is not occupied, we stick with the 300 lux default. If the apartment is occupied and the ambient_light coming from outside is low enough (< 0.2), then we add a value between 10 and 100, accounting for artificial light in the apartment.

Repeating this process day by day, results in a set of time series like this:

Time series for a simulated apartment tsmaker --start-date 2025-09-01 --end-date 2025-09-15 --apartment --visualize

Although this simulation is far from realistic, the generator gives a good structure on how to generate realistic time series based on some (simplified) model of a real world process.

Clearly, there is still room for improvement on this generator:

  • sometimes it is darker during the day than during the night
  • the CO2 measurements are going up and down way too fast to be realistic.

But let’s leave it at that for this post. We now have a generator for generate five different, but correlated time series. The time series also have some sense of real world explanability to them. This will make it easier to interpret and explain patterns in the dataset.

This generator is implemented in the time series generator tsmaker (an open source Python package & CLI tool).