Welcome to wateranalysis’s documentation!¶
Indices and tables¶
Conf¶
This module include global configuration parameters for the software
-
class
wateranalysis.conf.config.
Configuration
¶ Bases:
object
This class define a correspondence between the name of the fixture and the filename of its time-series
splitters¶
This is a module which implement different algorithms to split a timeseries
-
class
wateranalysis.timeseries.splitters.
SimpleSplitter
(time_series, out_dir)¶ Bases:
object
SimpleSplitter class provides methods to split a time-series in multiple usages using threshold as unique criteria
-
split
(sep=' ', head=None, threshold=0)¶ This method split a time-series using a threshold as only one criteria. Splitted timeseries must be at least five samples long.
Parameters: - sep – the delimiter for the csv file, default is the space
- head – not None if the first line of the csv contains column titles
- threshold – a float value that is compared with the samples to identify first and last sample for splitting
-
filters¶
This module filters from a set of time-series overlays, whose features do not comply with some parameters
-
class
wateranalysis.timeseries.filters.
TSFilter
¶ Bases:
object
This class provide stati methods to filter overlays in a set of time-series
-
static
liters
(ts)¶ Parameters: ts – the time series of water flow samples [ml/s] Returns: the total amount of liters
-
static
outlayers
(ts_dir, min_dur_const=0, min_lit_const=0, min_samp_const=1, sep=' ')¶ This method scan the csv files in a directory and identifies time series whose features to dont comply with provided constraints :param ts_dir: the folder with csv fiels :param min_dur_const: the minimal duration of time-sereis :param min_lit_const: the minimum amount of liters of time-series :param min_samp_const: the minimum number of samples pf a times-series :param sep: the character used as separator in the csv file :return: the method returns a dictionary that for each constraint lists the basename of csv file which violate the constraints
-
static
remove_outlayers
(ts_dir, outlayers)¶ This method move the csv files listed in the outlayers dictionary to a subdire :param ts_dir: the folder containing the csv files :param outlayers: the dictionary listing the files to be moved :return: None
-
static
rename_usages
(ts_dir)¶ This method rename the n files in a directory in a way that their name corresponde to the first n numbers :param ts_dir: the folder containing the files :return: None
-
static
statistiscs¶
This module computes features of a timeseries
-
class
wateranalysis.timeseries.statistics.
TSParameters
¶ Bases:
object
This class provide static methods to compute properties of a timeseries composed of water flow sample
-
static
compute_parameters
(outfile, ts_dir, csv_sep=' ')¶ This method compute a list of features of time-series contained in each csv file of ts_dir folder Result are saved in a [fixture]_usage.csv file containing the following properties: (start_datetime, duration, liters, month, hour, day, max_flow) :param outfile: output filename :param ts_dir: the folder containing csv file :param csv_sep: the delimiter used in the csv file, space is the default value
-
static
liters
(ts)¶ This methods compute the total amount of liters from a time series that provide flow samples :param ts: it is an array of samples (epoch, flow_value) :return: the total amount of liters as a float
-
static
rename_usages
(ts_dir)¶ This method rename csv files in a directory in a way that filenames are a sequence of numbers [1,n] :param ts_dir: the directory containing csv files (1.csv, 2.csv, 4.csv …)
-
static
usages_perday
(outfile, filename, csv_sep=' ')¶ This method from the file of features (that in the first column contains start date_time of the time-series), computes usages per day in [fixture]_num_usage.csv. Each row of the output file contains four columns: [month, day, num_usages, weekday] :param fixture: prefix of produced output file :param filename: the input file, produced by the compute_features method. :param csv_sep: :return: the delimiter used in the csv file, space is the default value
-
static
timeseries.statistics¶
This module define models for describing statistically the frequency of fixture usage.
-
class
wateranalysis.models.statistics.
GlobalUsage
(df, df1, df2)¶ Bases:
object
This class define a statistical distribution that is the same each day of the year. It define the probability distribution that a person open the fixture n times, the probability that that usage happens at a certain hour, the average duration and the average number of liters of a usage.
-
compute_average
()¶ Compute the average duration and the average amount of consumed water per usage :return: average duration, average consumption
-
compute_frequency
()¶ Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
()¶ This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
-
class
wateranalysis.models.statistics.
ModelBuilder
(fixture, type, path='./data')¶ Bases:
object
This class i out_data_dir = None is used to build the desired model
-
build_model
()¶ Instantiate the model according to the type and the fixture. :return: The instantiated Model
-
-
class
wateranalysis.models.statistics.
MonthlyUsage
(df, df1, df2, month)¶ Bases:
object
This class define a statistical distribution that is the same each day of a specific month. It define the probability distribution that a person open the fixture n times in a day of a month, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific month.
-
compute_average
()¶ Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption
-
compute_frequency
()¶ Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
()¶ This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
-
class
wateranalysis.models.statistics.
WeeklyUsage
(df, df1, df2, day_week)¶ Bases:
object
This class define a statistical distribution of fixture usage a specific week-day. It define the probability distribution that a person open the fixture n times in a day of the week, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific day-week.
-
compute_average
()¶ Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption
-
compute_frequency
()¶ Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
()¶ This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
timeseries.statistics¶
This module define models for describing statistically the frequency of fixture usage.
-
class
wateranalysis.models.statistics.
GlobalUsage
(df, df1, df2) Bases:
object
This class define a statistical distribution that is the same each day of the year. It define the probability distribution that a person open the fixture n times, the probability that that usage happens at a certain hour, the average duration and the average number of liters of a usage.
-
compute_average
() Compute the average duration and the average amount of consumed water per usage :return: average duration, average consumption
-
compute_frequency
() Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
() This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
-
class
wateranalysis.models.statistics.
ModelBuilder
(fixture, type, path='./data') Bases:
object
This class i out_data_dir = None is used to build the desired model
-
build_model
() Instantiate the model according to the type and the fixture. :return: The instantiated Model
-
-
class
wateranalysis.models.statistics.
MonthlyUsage
(df, df1, df2, month) Bases:
object
This class define a statistical distribution that is the same each day of a specific month. It define the probability distribution that a person open the fixture n times in a day of a month, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific month.
-
compute_average
() Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption
-
compute_frequency
() Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
() This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
-
class
wateranalysis.models.statistics.
WeeklyUsage
(df, df1, df2, day_week) Bases:
object
This class define a statistical distribution of fixture usage a specific week-day. It define the probability distribution that a person open the fixture n times in a day of the week, the probability that that usage occurs at a certain hour, the average duration and the average number of liters of a usage in a dey of specific day-week.
-
compute_average
() Compute the average duration and the average amount of consumed water per usage in a day of a specific month :return: average duration, average consumption
-
compute_frequency
() Tis method compute the ratio between the number of days with n usages and the total number of days :return: an array that contains this ratio for each n value (according to the actual occurred usages)
-
compute_times
() This method compute the ratio between the number of usages occured at a certain hour and the total number of occurrences. :return: an array with such ration for each hour of the day
-
learning.cluster¶
This module compute the k-means clustering of time-series reperesented as an array of features
-
class
wateranalysis.learning.cluster.
TSCluster
(folder, filename, runs)¶ Bases:
object
This class implements the k-means clustering of a set of time-series represented as an array of featuers
-
compute_clusters
(testset)¶ This method compute the clustering of the time-series. It creates one sub-folder per cluster and copies the corresponding time-series there. :param testset: the vectors of featuers :return: the number of clusters, the array of cluster id.
-
extract_features
(parameters=[])¶ This methods projects the vectors of features contained in the [fixture]_usage.csv file of features, saving the result into the [fixture].individuals file :param parameters: the list of parameter names :return: None
-
find_k1
(testset)¶ This method compute the best number of clusters from the testest list of vectors :param testset: the vectors of features :return: the number of clusters
-
get_testset
()¶ This returns the vectors of features normalizing each parameter :return: the normalized vectors of features.
-
meanshift
(testset)¶ Compute clustering with MeanShift :param testset: :return:
-
static
plot_clusters
(testset, clusters, axis=[0, 1])¶ This method plots the clusters along two dimension :param testset: the vectors of features :param clusters: the cluster id of the corresponding vector :param axis: the features to be used a plot dimensions :return: plt
-
learning.randomforest¶
This module use machine learning technique to learn to which cluster will belong the time-series if it runs in a defined day at a defined hour.
-
class
wateranalysis.learning.randomforest.
RandomForest
(folder, fixture, n_clusters)¶ Bases:
object
This class provide methods for learning, evaluating and predicting the cluster id of a time-series according the date-time it is running
-
compute_features
(clusters)¶ This methods read the vector of features (datetime, duration, liters, maxflow) from the relateed file. It uses the date-time on which the time-series started to compute the hour of the day, the day of the month, the day of the week
Parameters: clusters – the list of cluster ids identified for the time-series to be analyzed. Returns:
-
evaluate
()¶ This method use The RandomForest algorithm to evaluate how works the learning and prediction of cluster id. :return:
-
learn
(data_dir)¶ This methods learns how the cluster id depends on the following parameters of the time-series: hour, day_week, day_month :param data_dir: the folder where the learned model must be serialized :return: None
-
static
predict
(model_file, file_items)¶ This methods loads the model_file :param model_file: the filename where the previous learning phase saved the odel. :param file_items: the vectors of features of the time-series whose cluster must be predicted. :return:
-