Welcome to wateranalysis’s documentation!

Indices and tables

Conf

This module include global configuration parameters for the software

class wateranalysis.conf.config.Configuration

Bases: object

This class define a correspondence between the name of the fixture and the filename of its time-series

splitters

This is a module which implement different algorithms to split a timeseries

class wateranalysis.timeseries.splitters.SimpleSplitter(time_series, out_dir)

Bases: object

SimpleSplitter class provides methods to split a time-series in multiple usages using threshold as unique criteria

split(sep=' ', head=None, threshold=0)

This method split a time-series using a threshold as only one criteria. Splitted timeseries must be at least five samples long.

Parameters:
  • sep – the delimiter for the csv file, default is the space
  • head – not None if the first line of the csv contains column titles
  • threshold – a float value that is compared with the samples to identify first and last sample for splitting

filters

This module filters from a set of time-series overlays, whose features do not comply with some parameters

class wateranalysis.timeseries.filters.TSFilter

Bases: object

This class provide stati methods to filter overlays in a set of time-series

static liters(ts)
Parameters:ts – the time series of water flow samples [ml/s]
Returns:the total amount of liters
static outlayers(ts_dir, min_dur_const=0, min_lit_const=0, min_samp_const=1, sep=' ')

This method scan the csv files in a directory and identifies time series whose features to dont comply with provided constraints :param ts_dir: the folder with csv fiels :param min_dur_const: the minimal duration of time-sereis :param min_lit_const: the minimum amount of liters of time-series :param min_samp_const: the minimum number of samples pf a times-series :param sep: the character used as separator in the csv file :return: the method returns a dictionary that for each constraint lists the basename of csv file which violate the constraints

static remove_outlayers(ts_dir, outlayers)

This method move the csv files listed in the outlayers dictionary to a subdire :param ts_dir: the folder containing the csv files :param outlayers: the dictionary listing the files to be moved :return: None

static rename_usages(ts_dir)

This method rename the n files in a directory in a way that their name corresponde to the first n numbers :param ts_dir: the folder containing the files :return: None

statistiscs

This module computes features of a timeseries

class wateranalysis.timeseries.statistics.TSParameters

Bases: object

This class provide static methods to compute properties of a timeseries composed of water flow sample

static compute_parameters(outfile, ts_dir, csv_sep=' ')

This method compute a list of features of time-series contained in each csv file of ts_dir folder Result are saved in a [fixture]_usage.csv file containing the following properties: (start_datetime, duration, liters, month, hour, day, max_flow) :param outfile: output filename :param ts_dir: the folder containing csv file :param csv_sep: the delimiter used in the csv file, space is the default value

static liters(ts)

This methods compute the total amount of liters from a time series that provide flow samples :param ts: it is an array of samples (epoch, flow_value) :return: the total amount of liters as a float

static rename_usages(ts_dir)

This method rename csv files in a directory in a way that filenames are a sequence of numbers [1,n] :param ts_dir: the directory containing csv files (1.csv, 2.csv, 4.csv …)

static usages_perday(outfile, filename, csv_sep=' ')

This method from the file of features (that in the first column contains start date_time of the time-series), computes usages per day in [fixture]_num_usage.csv. Each row of the output file contains four columns: [month, day, num_usages, weekday] :param fixture: prefix of produced output file :param filename: the input file, produced by the compute_features method. :param csv_sep: :return: the delimiter used in the csv file, space is the default value

timeseries.statistics

This module define models for describing statistically the frequency of fixture usage.

class wateranalysis.models.statistics.GlobalUsage(df, df1, df2)

Bases: object

This class define a statistical distribution that is the same each day of the year. It define the probability distribution that a person open the fixture n times, the probability that that usage happens at a certain hour, the average duration and the average number of liters of a usage.

compute_average()

Compute the average duration and the average amount of consumed water per usage :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

class wateranalysis.models.statistics.ModelBuilder(fixture, type, path='./data')

Bases: object

This class i out_data_dir = None is used to build the desired model

build_model()

Instantiate the model according to the type and the fixture. :return: The instantiated Model

class wateranalysis.models.statistics.MonthlyUsage(df, df1, df2, month)

Bases: object

This class define a statistical distribution that is the same each day of a specific month. It define the probability distribution that a person open the fixture n times in a day of a month, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific month.

compute_average()

Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

class wateranalysis.models.statistics.WeeklyUsage(df, df1, df2, day_week)

Bases: object

This class define a statistical distribution of fixture usage a specific week-day. It define the probability distribution that a person open the fixture n times in a day of the week, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific day-week.

compute_average()

Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

timeseries.statistics

This module define models for describing statistically the frequency of fixture usage.

class wateranalysis.models.statistics.GlobalUsage(df, df1, df2)

Bases: object

This class define a statistical distribution that is the same each day of the year. It define the probability distribution that a person open the fixture n times, the probability that that usage happens at a certain hour, the average duration and the average number of liters of a usage.

compute_average()

Compute the average duration and the average amount of consumed water per usage :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

class wateranalysis.models.statistics.ModelBuilder(fixture, type, path='./data')

Bases: object

This class i out_data_dir = None is used to build the desired model

build_model()

Instantiate the model according to the type and the fixture. :return: The instantiated Model

class wateranalysis.models.statistics.MonthlyUsage(df, df1, df2, month)

Bases: object

This class define a statistical distribution that is the same each day of a specific month. It define the probability distribution that a person open the fixture n times in a day of a month, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific month.

compute_average()

Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

class wateranalysis.models.statistics.WeeklyUsage(df, df1, df2, day_week)

Bases: object

This class define a statistical distribution of fixture usage a specific week-day. It define the probability distribution that a person open the fixture n times in a day of the week, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific day-week.

compute_average()

Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption

compute_frequency()

Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)

compute_times()

This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day

learning.cluster

This module compute the k-means clustering of time-series reperesented as an array of features

class wateranalysis.learning.cluster.TSCluster(folder, filename, runs)

Bases: object

This class implements the k-means clustering of a set of time-series represented as an array of featuers

compute_clusters(testset)

This method compute the clustering of the time-series. It creates one sub-folder per cluster and copies the corresponding time-series there. :param testset: the vectors of featuers :return: the number of clusters, the array of cluster id.

extract_features(parameters=[])

This methods projects the vectors of features contained in the [fixture]_usage.csv file of features, saving the result into the [fixture].individuals file :param parameters: the list of parameter names :return: None

find_k1(testset)

This method compute the best number of clusters from the testest list of vectors :param testset: the vectors of features :return: the number of clusters

get_testset()

This returns the vectors of features normalizing each parameter :return: the normalized vectors of features.

meanshift(testset)

Compute clustering with MeanShift :param testset: :return:

static plot_clusters(testset, clusters, axis=[0, 1])

This method plots the clusters along two dimension :param testset: the vectors of features :param clusters: the cluster id of the corresponding vector :param axis: the features to be used a plot dimensions :return: plt

learning.randomforest

This module use machine learning technique to learn to which cluster will belong the time-series if it runs in a defined day at a defined hour.

class wateranalysis.learning.randomforest.RandomForest(folder, fixture, n_clusters)

Bases: object

This class provide methods for learning, evaluating and predicting the cluster id of a time-series according the date-time it is running

compute_features(clusters)

This methods read the vector of features (datetime, duration, liters, maxflow) from the relateed file. It uses the date-time on which the time-series started to compute the hour of the day, the day of the month, the day of the week

Parameters:clusters – the list of cluster ids identified for the time-series to be analyzed.
Returns:
evaluate()

This method use The RandomForest algorithm to evaluate how works the learning and prediction of cluster id. :return:

learn(data_dir)

This methods learns how the cluster id depends on the following parameters of the time-series: hour, day_week, day_month :param data_dir: the folder where the learned model must be serialized :return: None

static predict(model_file, file_items)

This methods loads the model_file :param model_file: the filename where the previous learning phase saved the odel. :param file_items: the vectors of features of the time-series whose cluster must be predicted. :return: