admin管理员组

文章数量:1122879

多层感知机 深度神经网络

in collaboration with Hsu Chung Chuan, Lin Min Htoo, and Quah Jia Yong.

与许忠传,林敏涛和华佳勇合作。

1. Introduction

1.简介

Since the early 1990s, several countries, mostly in the European Union and North America, had started to deregulate their traditionally state-controlled energy sectors[1,2]. This has resulted in the marketisation of energy where — like other commodities — energy is traded competitively in free markets, using instruments such as the spot and derivative contracts [3]. Unlike most commodities, however, energy load is strongly dependent on short-term environmental conditions such as temperature, wind speed, cloud cover, precipitation, et cetera, in addition to the intensity of day-to-day business activities. This dependency, compounded by the fact that energy is a non-storable commodity, makes energy load highly volatile.

自1990年代初以来,几个国家(主要在欧盟和北美)已开始放松对传统上由国家控制的能源部门的管制[1,2]。 这导致了能源的市场化,在该市场上,与其他大宗商品一样,能源通过使用即期和衍生合约等工具在自由市场中竞争性交易[3]。 但是,与大多数商品不同,除了日常业务活动的强度外,能源负荷还强烈依赖于短期环境条件,例如温度,风速,云量,降水 。 这种依赖性,再加上能源是不可储存的商品这一事实,使能源负荷极易波动。

The environmental dependence and volatility present a unique challenge to energy traders whose aim is to accurately predict energy production (commonly between 18 or 48 hours following the forecast)[4]. Recently, methods developed in machine learning, especially the Deep Neural Network (DNN), has shown impressive predictive performance in energy load forecasting tasks. We will review this briefly in Section 2. Despite their impressive performance, however, most of these methods only utilise common loss functions such as mean squared error (MSE), mean absolute error (MAE) and their variants e.g. root mean squared error (RMSE), relative root mean squared error (RRMSE) and mean absolute percentage errors (MAPE) [5,6,7,8,9]. This presents a limitation since these losses only minimise the difference between predicted and true values, which although convenient, often do not reflect the potentially intricate profit structure outlined by the specific set of contracts a trader agrees to.

环境依赖性和波动性对能源贸易商提出了独特的挑战,其目标是准确预测能源生产(通常在预测后的18或48小时之间)[4]。 最近,在机器学习中开发的方法,尤其是深度神经网络(DNN),在能源负荷预测任务中显示出令人印象深刻的预测性能。 我们将在第2节中对此进行简要回顾。尽管它们的性能令人印象深刻,但是大多数方法仅利用常见的损失函数,例如均方误差(MSE),均值绝对误差(MAE)及其变量,例如均方根误差(RMSE) ),相对均方根误差(RRMSE)和平均绝对百分比误差(MAPE)[5,6,7,8,9]。 由于这些损失只会使预测值与真实值之间的差异最小化,因此存在局限性,尽管这很方便,但通常无法反映出交易者同意的一组特定合同概述的潜在复杂的利润结构。

Hence, we propose a loss we call the Opportunity Loss defined as the difference between the optimal reward of predicting exactly the true value and the reward of the estimate. This loss is thus flexible and follows the the profit structure outlined by a trader’s contract.

因此,我们提出了一种称为机会损失的损失,定义为准确预测真实价值的最佳奖励与估计的奖励之间的差。 因此,这种损失是灵活的,并且遵循交易者合同概述的利润结构。

We test our approach using hourly data from 8 wind farms in the Ile-de-France region between January 1st, 2017 to July 14th, 2020. The datasets are provided by Réseau de transport d'électricité and Terra Weather as part of the AI4Impact 2020 Datathon [10]. On several DNN architectures we show that our proposed loss performs comparably to common loss functions when profit structure is symmetric, however, outperforms common loss functions when profit structure is asymmetric. Remarkably, these performances are achieved without needing to pass variables used to calculate opportunity loss as model inputs.

我们使用2017年1月1日至2020年7月14日之间法兰西岛地区的8个风电场的每小时数据测试我们的方法。数据集由Réseaude transport d'électricité和Terra Weather作为AI4Impact 2020的一部分提供Datathon [10]。 在几种DNN架构上,我们表明,当利润结构对称时,我们提出的损失与普通损失函数具有可比性,但是当利润结构不对称时,其损失优于常见损失函数。 值得注意的是, 无需将用于计算机会损失的变量作为模型输入即可实现这些性能。

In addition to our proposed loss, we also introduce a novel way to jointly encode the effect of wind direction and speed which improves simulated trading performance. This is especially novel given that most studies consider only either wind speed or wind direction as features [11,12,13,14], but rarely both [15].

除了我们提出的损失外,我们还引入了一种新颖的方式来联合编码风向和风速的影响,从而改善了模拟交易的绩效。 鉴于大多数研究仅风速或风向视为特征[11,12,13,14],而很少同时考虑两者[15],所以这是特别新颖的。

We segment this article into six sections. In the next section we explain the DNN and baseline architectures we consider for the experiments. Section 3 explains common losses and introduces our proposed opportunity loss. We then explain our experiment design, including our approach to feature engineering, in section 4. We present the results of our experiments in section 5, before concluding in section 6.

我们将本文分为六个部分。 在下一部分中,我们将解释我们为实验考虑的DNN和基线架构。 第3节介绍了常见的损失,并介绍了我们建议的机会损失。 然后,我们在第4节中说明我们的实验设计,包括我们进行特征工程的方法。在第6节中总结之前,我们在第5节中介绍了我们的实验结果。

2. Models

2.型号

Recent developments in energy load forecasting has seen impressive predictive performance of DNN architectures. Many have shown that DNNs outperform canonical time-series prediction models such as ARIMA [9, 16, 17] and Linear Models [18]. This article considers state-of-the art architectures that has recently been adopted for the energy production forecasting domain such as one-dimensional CNN (1D-CNN) [19,20], Long Short Term Memory (LSTM)[6,21,22], and LSTM-CNN combined architectures [5,7]. As baseline comparisons, we use the persistence and the Multi-Layer Perceptron (MLP).

能量负荷预测的最新发展已经看到了DNN架构令人印象深刻的预测性能。 许多研究表明,DNN的性能优于规范的时间序列预测模型,例如ARIMA [9,16,17]和线性模型[18]。 本文考虑了最近在能源生产预测领域采用的最新架构,例如一维CNN(1D-CNN)[19,20],长期短期记忆(LSTM)[6,21, 22]和LSTM-CNN组合架构[5,7]。 作为基线比较,我们使用持久性和多层感知器(MLP)。

2.1. Persistence

2.1。 坚持不懈

Persistence is the simplest forecast model that we use here solely for evaluation purposes. It assumes that the energy production after the prediction window L is the same as the production when the prediction is made. It can be described in equation as:

持久性是我们仅用于评估目的的最简单的预测模型。 假设在预测窗口L之后的能量产生与进行预测时的能量相同。 可以用等式描述为:

2.2. MLP

2.2。 MLP

The MLP is an architecture inspired by the interaction between neurons in the human brain. Each unit of the MLP is described as a linear combination between the input a with learnable weights W and a bias b which is then transformed by a non-linear activation function σ(·). An example of a non-linear activation function is the rectified linear unit (ReLU) which transforms all negative values to zero and leaves all positive values unchanged.

MLP是受人脑神经元之间相互作用启发的架构。 MLP的每个单位被描述为具有可学习权重W的输入a和偏置b之间的线性组合,然后通过非线性激活函数σ( · )对其进行转换 。 非线性激活函数的一个示例是整流线性单位(ReLU),它将所有负值都转换为零,而使所有正值保持不变。

Using these units as building blocks one can build a network that reroutes the information from inputs X into the estimate y_hat, essentially creating a network that can be thought as an approximation of a non-linear function f(X) [23]. Below is an example of an MLP with three hidden layers with widths five, four, and three.

使用这些单元作为构建块,可以构建一个网络,该网络将信息从输入X重新路由到估计值y_hat ,从本质上创建一个网络,可以将其视为非线性函数f(X)的近似值[23]。 下面是一个MLP的示例,该MLP具有三个隐藏层,其宽度分别为5、4和3。

Figure 1: A Multi-Layer Perceptron with a four dimensional input, one dimensional output and three hidden layers with widths five, four, and three, respectively. Image source: [24].
图1:一个多层感知器,具有一个四维输入,一维输出和三个隐藏层,分别具有五个,四个和三个宽度。 图片来源:[24]。

2.3. CNN

2.3。 有线电视新闻网

The Convolutional Neural Network (CNN) was originally developed in its two dimensional flavour by [25], which has since been popular in field of image recognition owing to its high degree of invariance to deformations such as translation and scaling [26]. That said, this architecture has also been recently adopted in its one-dimensional form to model time-series as illustrated in [27] and [28].

卷积神经网络(CNN)最初是由[25]以其二维形式开发的,由于其对变形(例如平移和缩放)的高度不变性,此后在图像识别领域广受欢迎[26]。 也就是说,这种架构最近也以其一维形式被采用来建模时间序列,如[27]和[28]所示。

A 1D-CNN is built of two kinds of layer, the convolutional layer and the maxpool layer. The convolution layer is a function that takes in a sequence of data and outputs a sequence of convolutions with less than or equal the original length. A convolution operation with kernel size k linearly combines k data with learnable weights W and bias b. A convolution layer of kernel size k and stride s, thus, convolves the first k data in a sequence before sliding across by s steps and convolving next k data. This repeats until the kernel reaches the end of the sequence.

1D-CNN由两种层构成:卷积层和maxpool层。 卷积层是一种功能,该功能接受一系列数据并输出小于或等于原始长度的一系列卷积。 内核大小为k的卷积运算将k个数据与可学习的权重W和偏差b线性组合。 因此,内核大小为k且步幅为s的卷积层在序列中对前k个数据进行卷积,然后滑动s步并对下一个k数据进行卷积。 重复此过程,直到内核到达序列末尾为止。

Figure 2: Visualisation of one Convolutional Layer with kernel (also often called filter) described in the grey boxes. Image source: [20].
图2:灰色框中描述的带有内核(也通常称为过滤器)的一个卷积层的可视化。 图片来源:[20]。

A maxpool layer of kernel size k and stride s does the same operation as the convolutional layer, but instead of convolving the data together, it takes the maximum value of the k data. Once the input has passed through both layers, the output is activated by a non-linear function σ(·), just like in the MLP.

内核大小为k且步幅为s的maxpool层执行与卷积层相同的操作,但不是将数据卷积在一起,而是取k个数据的最大值。 输入经过两层后,就可以通过非线性函数σ( · )激活输出,就像在MLP中一样。

In practice, 1D-CNN architectures have several convolution-maxpool layers stacked on top of one another which then is connected to a fully-connected (MLP) layer which generates the estimate.

在实践中,一维CNN架构具有几个彼此叠置的卷积最大池层,然后将它们连接到生成估计值的全连接(MLP)层。

2.4. LSTM

2.4。 LSTM

Besides CNNs, another popular neural network architecture for forecasting is the Recurrent Neural Network (RNN). The RNN architecture as shown in Figure 3.a below has a recurrent property, which means that the input to a hidden layer h_t is not only the data X_t (like in MLP), but also the output of the previous hidden layer h_{t-1}. This property allows the network to encode temporal patterns.

除了CNN,另一种流行的用于预测的神经网络架构是递归神经网络(RNN)。 如下图3.a所示的RNN架构具有循环属性,这意味着隐藏层h_t的输入不仅是数据X_t (如MLP中一样),而且是前一个隐藏层h_ {t的输出-1} 。 此属性允许网络对时间模式进行编码。

LSTM [29] is a popular enhancement to the RNNs as shown in Figure 3.b, which operations are illustrated by:

LSTM [29]是RNN的一种流行增强,如图3.b所示,其操作如下所示:

In these equations, f_t, i_t, and o_t are the forget, input and output gates which are identified by the weights W_f, W_i, W_o and biases b_f, b_i, b_o respectively. Whilst, C_t, C{tilde}_t denote the cell state and proposal cell state, and W_c, b_c are cell weights and biases, respectively. In this case σ(·) denotes the sigmoid activation function and tanh(·) denotes the hyperbolic tangent activation function.

在这些等式中,F_,I_T, O_t同是忘记,其通过权重W_f,W_i,W_o和偏见b_f识别的输入和输出门,b_i,分别b_o。 虽然,C_ ,C {波浪} _ t指小区状态和建议细胞状态,并且W_ C,分别b_c是细胞重量和偏见。 在这种情况下, σ( · )表示S型激活函数,而tanh(·)表示双曲正切激活函数。

Figure 3: A standard RNN unit (a) and an LSTM unit (b). Image source: [5].
图3:标准RNN单元(a)和LSTM单元(b)。 图片来源:[5]。

In application, the forget and input gates collectively learn to keep essential information the past into the cell state and to forget useless ones. The output gate learns the conditions at which the information in the cell state is relevant to the next hidden state. This enhancement allows the neural network to remember important information from a distant past which vanilla RNNs struggle with due to vanishing gradients. Like CNNs, the last hidden state is usually passed through a fully connected layer to obtain the estimate.

在应用中,忘记门和输入门共同学习如何将过去的重要信息保持在单元状态,并忘记无用的信息。 输出门学习单元状态中的信息与下一个隐藏状态相关的条件。 这种增强使神经网络可以记住远距离的重要信息,而过去的香草由于梯度的消失而难以与之联系。 像CNN一样,最后的隐藏状态通常会经过一个完全连接的层以获得估计。

2.5. LSTM-CNN Combination

2.5。 LSTM-CNN组合

Following [7], we also tried a combination of the two architectures which they claim to capture local and temporal trends simultaneously. Their architecture firstly runs one LSTM and one CNN simultaneously on the model inputs, which outputs are then concatenated before it is passed through a fully connected layer. This is illustrated below in Figure 4.

按照[7],我们还尝试了两种架构的组合,他们声称可以同时捕获本地和时间趋势。 他们的体系结构首先在模型输入上同时运行一个LSTM和一个CNN,然后在输出通过完全连接的层之前对其进行串联。 如下图4所示。

Figure 4: A combined CNN-LSTM architecture as suggested by [7].
图4:[7]提出的CNN-LSTM组合架构。

3. Loss

3.损失

Models described in Section 2 learn their parameters through minimisation of a loss function. In regression tasks, losses commonly reflect the difference between the actual value to be predicted and the estimate. Once defined, we can backpropagate [30] the loss gradient through the layers, updating the parameters along the way.

第2节中描述的模型通过最小化损失函数来学习其参数。 在回归任务中,损失通常反映了要预测的实际值与估计值之间的差异。 定义好之后,我们可以反向传播[30]穿过各层的损耗梯度,并一路更新参数。

3.1. Common Losses

3.1。 常见损失

One of the most common loss functions for forecasting is the mean square error (MSE) defined as

用于预测的最常见损失函数之一是均方误差(MSE),其定义为

Another common loss function is the mean absolute error (MAE) defined as

另一个常见的损​​失函数是平均绝对误差(MAE),定义为

which is often used when robustness to outliers is a concern.

当需要考虑对异常值的鲁棒性时,通常会使用它。

3.2. Opportunity Loss

3.2。 机会损失

While the previously mentioned losses and their variants are good for its mathematical tractability and simplicity, in application these losses can be limiting. For instance, the profit structure defined by a set of contracts agreed by a trader can be asymmetric where traders must pay a higher penalty for overprediction as compared to underprediction (in the form of opportunity cost). In this example, MSE and MAE cannot capture accurately this asymmetric profit structure.

尽管前面提到的损耗及其变体在数学上的易处理性和简单性方面都很不错,但在应用中这些损耗可能是有限的。 例如,由交易者达成的一组合同定义的利润结构可能是不对称的,与相对于预测不足(以机会成本的形式)相比,交易者必须为预测过度付出更高的惩罚。 在此示例中,MSE和MAE无法准确捕获这种不对称的利润结构。

Given the possibility of intricate profit structures with many variables and interactions, instead of using the common losses we propose implementing the opportunity loss defined as

鉴于而是采用我们提出实现定义为机会损失的共同损失诸多变数和交互复杂的利益结构的可能性,

where Rbar{(S)}_n is the optimal revenue (i.e. perfect forecast) and Rhat{(S)}_n is the estimated revenue, both calculated with respect to the profit structure S. This creates a flexible loss that can describe any revenue structure given that the revenue is optimal when predictions are perfect. We will show that our proposed loss performs better than MSE and MAE for some revenue structures even when some of the inputs defining the loss is not a feature the neural network is trained on.

其中Rbar {(S)} _ n是最佳收入(即完美的预测), Rhat {(S)} _ n是估计收入,两者都是针对利润结构S进行计算的 假设预测完美时收益是最优的,这将产生一个可以描述任何收益结构的灵活亏损。 我们将证明,即使某些定义损失的输入不是神经网络所训练的特征,我们提出的损失在某些收入结构上也比MSE和MAE更好。

4. Experiments

4.实验

4.1. Dataset description

4.1。 数据集描述

In our experiments, we use aggregated hourly wind energy production data of wind farms in the Ile-de-France region as provided by Réseau de transport d’électricité’s online database [31] which is illustrated in Figure 6. In addition to wind energy, we also use hourly wind speed (m/s) and direction (degrees North) data from 8 wind farms in Ile-de-France provided by Terra Weather as additional predictors. The later dataset is provided as part of the AI4impact Datathon [10]. Both datasets span the period between January 1st, 2017 to July 14th, 2020, which is the period we use for this study.

在我们的实验中,我们使用了法兰西岛大区风电场的每小时风能发电总量数据,该数据由Réseaude transport d'électricité的在线数据库[31]提供,如图6所示。我们还使用了Terra Weather提供的来自法兰西岛上8个风电场的每小时风速(m / s)和风向(北度)数据作为其他预测指标。 后来的数据集作为AI4impact Datathon [10]的一部分提供。 这两个数据集的时间跨度为2017年1月1日至2020年7月14日,这是我们用于本研究的时间段。

Figure 5: Map of the locations of the eight wind farms in Ile-de-France.
图5:法兰西岛上八个风电场的位置图。

As some wind farms only start operations within the period stated above, we impute the values prior to a farm’s starting date with zeros. Then, we linearly interpolate the remaining missing datapoints. Time and wind-related features as mentioned in the next subsection are then added to this dataset.

由于某些风电场仅在上述期限内开始运营,因此我们将在风电场开始日期之前的值估算为零。 然后,我们线性插值剩余的缺失数据点。 然后,将在下一部分中提到的与时间和风有关的特征添加到该数据集中。

Figure 6: Plot of wind energy production between January 1st, 2017 to July 14th, 2020. Note that the two red bars indicate the time in which the new Boissy and Angerville farms open.
图6:2017年1月1日至2020年7月14日之间的风能生产曲线。请注意,两个红色条形表示新的Boissy和Angerville农场的开放时间。

We normalise the data and measure the differences in wind energy production between each timepoint before reorganising the data into prediction windows of size 64 which will be used to predict change in wind energy 18 hours later. In doing this we aim to predict the differences between energy production at a time point and the energy production 18 hours later.

我们将数据标准化并测量每个时间点之间风能产生的差异,然后将数据重新组织到大小为64的预测窗口中,该窗口将用于预测18小时后的风能变化。 为此,我们旨在预测某个时间点的能量生产与18小时后的能量生产之间的差异

Lastly, we split the dataset into training, validation, and testing sets with 24,608 rows, 3,076 rows, and 3,076 rows respectively.

最后,我们将数据集分为训练集,验证集和测试集,分别具有24,608行,3,076行和3,076行。

4.2. Feature Engineering

4.2。 特征工程

To model this data, we engineer features from common predictors of wind energy production, namely time, wind speed, and wind direction. In this subsection we explain three feature engineering approaches. These approaches are implemented on top of common forecasting features such as difference, momentum, force, mean, median, kurtosis, et cetera. We present a detailed list of experiments we ran with different features in Supplementary Materials 1.

为了对这些数据进行建模,我们设计了风能生产的常见预测变量的特征,即时间,风速和风向。 在本小节中,我们解释了三种特征工程方法。 这些方法是在常见的预测特征(例如差异,动量,力,均值,中位数,峰度等)之上实现的 。 我们提供了补充材料1中具有不同功能的实验的详细清单。

4.2.1. Time

4.2.1。 时间

Time can be an important feature in wind energy forecasting as it may encode information about time-sensitive patterns as well as periodicity. In this article we consider two representation of time, one dimensional time and two dimensional time.

时间可能是风能预测中的重要特征,因为它可以对有关时间敏感模式以及周期性的信息进行编码。 在本文中,我们考虑时间的两种表示形式,即一维时间和二维时间。

One dimensional time encodes the time elapsed since the first observation which is implemented by adding the index of the time series as a feature. This may improve performance by providing the information allowing the model to learn about possible long-term trends that changes over time.

一维时间编码自第一次观察以来经过的时间,这是通过将时间序列的索引添加为特征来实现的。 通过提供允许模型学习随时间变化的可能的长期趋势的信息,可以提高性能。

Two dimensional time, on the other hand, can be used to represent periodicity. This is implemented as a trigonometric transformation [32, 33] described in Figure 7.

另一方面,二维时间可以用来表示周期性。 这被实现为图7中描述的三角变换[32,33]。

Figure 7: Illustration of the encoding for cyclical time. Image source: [33].
图7:循环时间编码的图示。 图片来源:[33]。

4.2.2. Wind direction

4.2.2。 风向

Like time, wind direction (often given in degrees North) is also cyclical. Hence, we also designed a trigonometric transformation, describing the direction with

像时间一样,风向(通常以北度为单位)也是周期性的。 因此,我们还设计了一个三角变换,用

instead of degrees, where α is the wind direction.

而不是度,其中α是风向。

Figure 8: Plots comparing correlations between energy production and features such as wind speed, wind direction and their transformations for all eight wind farms.
图8:比较了所有八个风电场的能源生产与特征(例如风速,风向及其转换)之间的相关关系的图。

As shown in Figure 8, this transformation improves (Pearson’s) correlation between wind direction and energy production. We conjecture that including these more correlated features as model inputs may help improve model performance.

如图8所示,此变换改善了风向与能量产生之间的(皮尔逊氏)相关性。 我们推测,将这些更相关的功能作为模型输入包括在内可能有助于提高模型性能。

4.2.3. Joint effect encoding of wind speed and direction

4.2.3。 风速和风向的联合效果编码

As illustrated in Figure 9, wind speed alone is not directly correlated to energy output. Wind direction also plays a role since if the wind is not blowing at an optimal direction, despite high wind speed, energy production can be small. We hypothesise that wind blowing perpendicular to the turbines would not be able to turn it, resulting in zero or very little production.

如图9所示,仅风速并不与能量输出直接相关。 风向也起着重要作用,因为尽管风速很高,但如果风向不是最佳方向吹动,则能量产生会很小。 我们假设垂直于涡轮机的风将无法使其转动,从而导致零产量或非常少的产量。

Figure 9: Total wind energy against wind direction at one of the largest wind farms, Parc du Gatinais.
图9:最大的风电场之一Parc du Gatinais的总风能与风向的关系。

From Figure 9, a scatterplot of energy production against wind direction in one of the largest wind farms, we noted two distinct clusters. The first cluster is centred around 45 degrees North, and the second at 225 degrees North, which is at the same axis but blowing in the opposite direction. We noted that the closer the wind direction to this axis, the higher the wind energy production (illustrated as darker red spots). Thus, we inferred that ideal direction for the wind turbines is around 45 and 225 degrees North. Conversely, we also hypothesise from Figure 9 that the direction that minimises energy production is at 135 and 315 degrees North.

从图9可以看出,在最大的风电场之一中,能源生产相对于风向的散布图显示了两个不同的集群。 第一个星团的中心大约是北纬45度,第二个星团的中心是北纬225度,它的轴线相同,但方向相反。 我们注意到,风向越靠近此轴,风能产生就越高(如图中暗点所示)。 因此,我们推断出风力涡轮机的理想方向大约为北45度和225度。 相反,我们也从图9假设,使能量产生最小化的方向是在135度和315度以北。

Figure 10: Output encoding for joint effect between wind speed and wind direction.
图10:风速和风向之间联合效应的输出编码。

We thus devised a novel function to capture the joint effect of wind speed and direction described in this trigonometric output function

因此,我们设计了一个新颖的函数来捕获此三角输出函数中描述的风速和风向的联合效应

where offset is the ideal wind direction. By multiplying speed with the trigonometric transformation of the direction, we get an overall function that considers both ideal wind direction and wind speed. As shown in Figure 11, this feature is better correlated with wind energy production.

其中偏移是理想的风向。 通过将速度乘以方向的三角变换,我们得到一个既考虑理想风向又考虑风速的整体函数。 如图11所示,此功能与风能生产更好地关联。

Figure 11: Scatter plot between energy production and normal and transformed wind speed.
图11:能量产生与正常风速和转换风速之间的散点图。

4.3. Model Parameters

4.3。 型号参数

Every DNN models takes a time series sequence of length 64 x d, where d is the feature dimension which varies according to the experiment. Our implementation of the MLP model consists of 4 hidden layers with widths 1024, 512, 256 and 128. Our 1D-CNN model is implemented with 4 convolutional layers having 64, 128, 256, and 512 channels with kernel sizes of 9, 7, 5, and 3 and maxpool layer of size 2. The last output of the convolutional layer is then passed through a fully connected layer of size 512. The LSTM model is implemented with 128 hidden and cell nodes each and a fully connected layer of size 512. The CNN-LSTM hybrid uses the hyperparameters above with a fully connected layer of 512.

每个DNN模型都采用长度为64 x d的时间序列,其中d是随实验变化的特征维。 我们的MLP模型实现包含4个隐藏层,宽度分别为1024、512、256和128。我们的1D-CNN模型由4个卷积层实现,这些卷积层具有64、128、256和512个通道,内核大小分别为9、7, 5、3和maxpool层的大小为2。然后,卷积层的最后输出通过大小为512的完全连接层。LSTM模型是通过128个隐藏节点和单元节点以及大小为512的完全连接层实现的CNN-LSTM混合体使用上面的超参数以及512个完全连接的层。

For all the models described in the previous paragraph, we run 20 epochs with batch size of 256, using 0.001 learning rate, 0.1 dropout and early stopping with patience 2 and minimum permissible decrease in validation loss is 0.00005 smaller than the existing minimum. We use the ReLU activation for all nodes in these models.

对于上一段所述的所有模型,我们以0.001的学习率,0.1的辍学率和耐心2尽早停止运行了20个时期,批次大小为256,验证损失的最小允许减少量比现有最小值减少了0.00005。 我们将ReLU激活用于这些模型中的所有节点。

We implement our models in PyTorch 1.5.0 which codes are available at https://github/kristoforusbryant/energy_production_forecasting/.

我们在PyTorch 1.5.0中实现了我们的模型,其代码可从https://github/kristoforusbryant/energy_production_forecasting/获得 。

4.4. Trading Simulation

4.4。 交易模拟

To assess trading performances of our models against varying profit structures, we use a simple deterministic simulation model defined by starting cash-in-hand (CIH) balance b_0, overprediction penalty o per unit, revenue of r per unit, and debt penalty d per unit.

为了评估针对不同利润结构的模型的交易性能,我们使用简单的确定性模拟模型,该模型通过启动手头现金(CIH)余额b_0,每单位的过高预测罚款o,每单位r的收入和每单位r的债务罚款d定义 。单元。

Starting with a balance of b_0, for every prediction made, if the model predicts less than the true value, r is credited per unit of prediction to the total balance b. For example, assume r = 10 cents/kWh, and the forecasted energy is 90kWh while actual energy produced is 100kWh. One then earns 90*10= 900 cents. The extra 10kWh can be thought as opportunity cost that one could have earned but did not.

从余额b_0开始,对于每个预测,如果模型预测的值小于真实值,则每预测单位将r记入总余额b 。 例如,假设r = 10 cents / kWh ,则预测能量为90kWh,而实际产生的能量为100kWh。 然后一个人赚90 * 10 = 900美分。 额外的10kWh可以认为是一个人本可以赚取但没有的机会成本。

On the other hand, if prediction is more than the true value, r is credited per unit of true value, but o is deducted per unit of overprediction. For example, assume further that o = 20 cents/kWh, and the forecasted energy is 90kWh while actual energy produced is 80kWh. Then, one earns 80*10=800 cents from actual energy but pay 20*10=200 cents to spot the difference. Hence our net revenue is 800–200=600 cents.

另一方面,如果预测值大于真实值,则每单位真实值记入r值,但每单位过度预测则减去o值。 例如,进一步假设o = 20 cents / kWh ,预测的能量为90kWh,而实际产生的能量为80kWh。 然后,一个人从实际能量中赚取80 * 10 = 800美分,但支付20 * 10 = 200美分以发现差额。 因此,我们的净收入是800-200 = 600美分。

Lastly, when balance less than or equal to zero, for every unit one overpredicts they must take on a loan which costs them d. For example, assume further that d = 100 cents/kWh. If the starting balance prior to the trade is 200 cents and one predicts 20 kWh when true output is 0 kWh, they are first penalised by 10 * 20 cents which brings their balance down to 0. In addition, they must take on a loan of 100*10 = 1,000 cents, which leaves them with a balance of -1,000 cents after the trade.

最后,当余额小于或等于零时,对于每个单位,人都高估了他们必须借入一笔贷款,从而使他们付出d的代价。 例如,进一步假设d = 100分/ kWh。 如果交易前的初始余额为200美分,并且当真实输出为0 kWh时预测为20 kWh,则它们首先将受到10 * 20美分的罚款,这会使它们的余额降低到0。此外,他们还必须借入100 * 10 = 1,000美分,交易后剩下的余额为-1,000美分。

Our we experiment with two scenarios, namely the symmetric revenue structure where r = 10 and b = 10, and an asymmetric revenue structure where r = 10 and b = 50. In both cases, the starting balance b_0 = 10,000,000 and d = 100. We test these two scenarios with our CNN-LSTM model using MSE, MAE and opportunity loss defined according to the simulation.

我们在两种情况下进行实验,即r = 10b = 10的对称收入结构,以及r = 10和b = 50的非对称收入结构。 在两种情况下,初始余额b_0 = 10,000,000d = 100 。 我们使用根据模拟定义的MSE,MAE和机会损失,使用CNN-LSTM模型测试了这两种情况。

5. Results

5.结果

In this section we present the results to our experiments. All values showing standard deviations are the means of 10 repeats and all trading simulations are ran with the starting balance b_0=10,000,000.

在本节中,我们将结果介绍给我们的实验。 所有显示标准差的值都是10次重复的平均值,并且所有交易模拟均以初始余额b_0 = 10,000,000进行。

5.1. CNN and LSTM based models outperform baselines

5.1。 基于CNN和LSTM的模型优于基准

As shown in Figures 12.a and Table 1, the 1D-CNN and LSTM based models perform significantly better than baselines MLP and persistence in both MSE loss and simulation performance. Among the CNN and LSTM models, vanilla LSTM architecture does slightly better, obtaining the best simulation profit of 52.5 (0.21) and MSE of 0.01386(.00228). CNN and CNN-LSTM combined performance are comparable with this result as shown in Table 1. The MLP, although performs poorer than the more advanced DNNs, still perform significantly better than persistence.

如图12.a和表1所示,基于1D-CNN和LSTM的模型的性能明显优于基线MLP,并且在MSE损失和仿真性能方面都具有持久性。 在CNN和LSTM模型中,香草LSTM架构的性能稍好一些,获得了52.5(0.21)的最佳模拟利润和0.01386(.00228)的MSE。 如表1所示,CNN和CNN-LSTM的综合性能可与该结果相媲美。尽管MLP的性能比更高级的DNN差,但其性能仍远优于持久性。

Figure 12: (a) On the left is the trading simulation results of our DNN models as compared to persistence an optimal value. (b) On the right is the lagged correlation plot of our DNN models, also compared to optimal prediction and persistence.
图12:(a)左侧是我们的DNN模型的交易模拟结果,与持久性最优值相比。 (b)右侧是我们的DNN模型的滞后相关图,也与最佳预测和持久性进行了比较。
Table 1: Comparison of forecasting performance between our DNN models.
表1:我们的DNN模型之间的预测效果比较。

A similar trend is observed with the lagged (Pearson’s) correlations between the prediction and the target. As Figure 12.b and Table 1 shows, 1D-CNN, LSTM and CNN-LSTM hybrid have peak correlations at time 0. This is in contrast with MLP’s peak correlation that is lagged by 18 hours. Among the former methods, CNN, LSTM an CNN-LSTM hybrid has comparable peak correlation values with the LSTM model having highest peak correlation.

预测和目标之间的滞后(皮尔逊)相关性观察到类似的趋势。 如图12.b和表1所示,一维CNN,LSTM和CNN-LSTM混合体在时间0处具有峰值相关性。这与MLP的峰值相关性滞后18小时相反。 在以前的方法中,CNN,LSTM和CNN-LSTM混合体具有可比的峰值相关值,而LSTM模型具有最高的峰值相关性。

Figure 13: Illustrative example of our forecasts on 3076 hours with 18 hours predictive window.
图13:我们对3076小时的预测的说明性示例,带有18小时的预测窗口。

5.2. Opportunity loss improves simulation performance with asymmetric revenue structures

5.2。 机会损失通过不对称的收入结构提高了仿真性能

Figure 14.a shows the first experiment with symmetric revenue structure. Given the symmetric structure, we see that simulation performance of the three losses are comparable with MSE, MAE and opportunity loss earning 5.16(.11), 5.16(.08), 5.18(.16), respectively.

图14.a显示了第一个采用对称收入结构的实验。 给定对称结构,我们看到这三种损失的模拟表现分别与MSE,MAE和机会损失可比,分别为5.16(.11),5.16(.08),5.18(.16)。

When tested with an asymmetric revenue structure, however, the opportunity loss performs significantly better than the MAE and MSE losses with 3.87(.23), 3.40(.66), 2.92(.79) as seen from Figure 14.b. Between the common losses, MAE does significantly better than MSE.

然而,当使用非对称收入结构进行测试时,机会损失的表现明显优于MAE和MSE损失,其收益分别为3.87(.23),3.40(.66),2.92(.79),如图14.b所示。 在常见损失之间,MAE的表现明显优于MSE。

Figure 14: (a) On the left is a plot of revenue over time of our DNN models trained with MSE, MAE and Symmetric Opportunity Loss when trading simulation assumes symmetric profit structure. (b) On the right is the same models trained with MSE, MAE and Asymmetric Opportunity Loss evaluated with a trading simulation that assumes asymmetric profit structure.
图14:(a)左侧是当交易模拟采用对称利润结构时,我们的DNN模型经过MSE,MAE和对称机会损失训练后的收入随时间变化的曲线图。 (b)右边是通过假设不对称利润结构的交易模拟对通过MSE,MAE和不对称机会损失进行训练的相同模型。
Table 2: Simulation performance of DNN models on simulations with symmetric and asymmetric profit structures.
表2:DNN模型在具有对称和非对称利润结构的仿真上的仿真性能。

This result is to be expected as MSE and MAE losses are unaware of the asymmetric revenue structure it is simulated on. That said, it is noteworthy that the opportunity loss still performs well despite some of its constituents not taken as model inputs. In this simulation, for instance, the initial balance and the balance history used to calculate the opportunity loss are not inputs to the model, yet the neural networks can still show improved performance. Not needing to have all the parameters that defines the loss as model inputs makes the model more flexible, which may be helpful with cases where profit structures changes over time.

由于MSE和MAE亏损并未意识到其模拟的不对称收入结构,因此可以预期这一结果。 就是说,值得注意的是,尽管机会损失仍然是表现良好的,但其中一些因素并未作为模型输入。 例如,在此仿真中,用于计算机会损失的初始余额和余额历史记录输入模型,但神经网络仍可以显示出改进的性能。 不需要将定义损失的所有参数都作为模型输入,可以使模型更加灵活,这在利润结构随时间变化的情况下可能会有所帮助。

Figure 15: Scatter plots of predicted differences vs actual differences when DNN models are run with asymmetric opportunity loss (left) and MSE (right).
图15:当DNN模型以非对称机会损失(左)和MSE(右)运行时,预测差异与实际差异的散点图。

As Figure 15 shows, compared to the model trained with MSE, those trained with asymmetric opportunity loss tend to underpredict more than to overpredict. The tendency to be more conservative in their predictions while still keeping a tight variance around the actual difference is a possible explanation to why models trained on asymmetric opportunity loss outperforms those trained on MSE.

如图15所示,与通过MSE训练的模型相比,那些通过非对称机会损失训练的模型倾向于低估而不是高估。 他们的预测趋于保守的趋势,同时仍使实际差异保持紧密变化,这可能解释了为什么在非对称机会损失下训练的模型优于在MSE上训练的模型。

5.3. Product of cosines-squared embedding improves performance

5.3。 余弦平方嵌入的乘积可提高性能

Table 3 illustrates the result of our feature engineering approaches on test loss (MSE) and simulation performance ran only on our CNN-LSTM architecture, but we expect comparable results from our LSTM and 1D-CNN architectures. Note that this table is the result of a semi-manual semi-greedy search on the space of feature sets. This means that as one goes down the row, features that improves performance is kept when running the next additional features.

表3展示了我们的功能测试方法在测试损失(MSE)和仿真性能上仅在我们的CNN-LSTM架构上运行的结果,但是我们希望LSTM和1D-CNN架构具有可比的结果。 请注意,此表是对要素集空间进行半手动半贪婪搜索的结果。 这意味着,随着性能的提高,运行下一个附加功能时会保留提高性能的功能。

Table 3. Test loss and simulation performance on various feature sets.
表3.各种功能集上的测试损失和仿真性能。

Table 3 first illustrate how the 2D time feature as explained in 4.2.1 does not improve performance. While this seems to be at odds with the results by [32, 33], we conjecture that this might be caused by the minimal autocorrelation of the wind energy production as illustrated in Figure 16. This figure shows only one peak at the start but no peak afterwards. This implies that periodic trend for this dataset is minimum, which might explain why the 2D time feature does not improve performance.

表3首先说明了4.2.1中说明的2D时间功能如何不会提高性能。 尽管这似乎与[32,33]的结果不一致,但我们推测这可能是由风能产生的最小自相关引起的,如图16所示。该图在开始时仅显示一个峰值,但没有之后达到顶峰。 这意味着此数据集的周期性趋势是最小的,这可能可以解释为什么2D时间功能无法改善性能。

Figure 16: Autocorrelation plot of wind energy production.
图16:风能生产的自相关图。

We also observe that feature sets that includes first and second order differences improve performance, this seems to indicate that the way energy production, wind speed and wind direction changes is informative to forecasting the actual values of energy production.

我们还观察到,包含一阶和二阶差异的特征集可提高性能,这似乎表明,能源生产,风速和风向变化的方式对于预测能源生产的实际价值具有指导意义。

Lastly, we also observe that our novel approach to encoding the joint effect between wind speed and direction as a trigonometric function (4.2.3) significantly improve prediction performance. A list of experiments we ran is detailed in Supplementary Material 1.

最后,我们还观察到,我们将风速和风向之间的联合效应编码为三角函数(4.2.3)的新颖方法显着提高了预测性能。 补充材料1中详细列出了我们运行的实验清单。

6. Discussion and Conclusion

6.讨论与结论

In this article, we propose opportunity loss, a loss function that can be adapted to intricate profit structures. We prove in our simple asymmetry simulation that our loss function outperforms other losses in terms of the revenue earned. Further studies can consider implementing this approach on a more realistic simulations with more complex profit structures.

在本文中,我们提出机会损失,这是一种可以适应复杂的利润结构的损失函数。 我们通过简单的不对称模拟证明,就收入而言,我们的损失函数优于其他损失。 进一步的研究可以考虑在具有更复杂的利润结构的更现实的模拟中实施这种方法。

Remarkably, opportunity loss allows neural network models to learn the desired behaviour even when variables defining the loss function (such as initial balance) are not inputs to the model. We suspect this phenomenon might be caused by the neural networks implicitly learning latent representations of these loss variables, which is a possibility since they are the model inputs are not independent to the loss inputs. Further research can explore the extent to which — in terms of number of variables and independence– this observation holds.

值得注意的是,机会损失使神经网络模型,甚至学习所需的行为时定义的损失函数(如初始余额)变量输入到模型中。 我们怀疑这种现象可能是由于神经网络隐式学习了这些损失变量的潜在表示而引起的,这是有可能的,因为它们是模型输入并不独立于损失输入。 进一步的研究可以探索这种观察在多大程度上和独立性方面。

Moreover, further analysis on the amount of data needed for training is interesting. We claim this because in energy load forecasting, contracts can change by the day or even sometimes by the hour, thereby continually changing the profit structure. Suppose that we have a flexible enough loss function such as the opportunity loss, the bottleneck for deployment of a truly flexible loss function lies in the amount of training needed to adjust the model to fit new loss functions.

此外,对培训所需的数据量进行进一步分析很有趣。 我们之所以这样说,是因为在能源负荷预测中,合同可能一天甚至一天都发生变化,从而不断改变利润结构。 假设我们具有足够灵活的损失函数(例如机会损失),则部署真正灵活的损失函数的瓶颈在于调整模型以适应新的损失函数所需的训练量。

A tangential idea is to implement the idea from cooperative inverse reinforcement learning [34] which instead of defining the loss function analytically, makes the model learn its own loss function through manual feedback by humans. This process creates a neural representation of the loss function which may help in the case where profit structure is difficult to define analytically.

一个切线的想法是从协作逆强化学习中实现这个想法[34],而不是通过分析定义损失函数,而是使模型通过人工反馈来学习自己的损失函数。 此过程创建了损失函数的神经表示,这在难以通过分析定义利润结构的情况下可能会有所帮助。

Besides opportunity loss, we also implemented a novel encoding for the joint effect of wind speed and energy, which we have shown to improve performance.

除了机会损失之外,我们还针对风速和能量的联合效应实施了一种新颖的编码,已证明可以提高性能。

7. References

7.参考

[1] Watkiss, J. D., & Smith, D. W. (1993). The Energy Policy Act of 1992-A watershed for competition in the wholesale power market. Yale J. on Reg., 10, 447.

[1] Watkiss,JD,&Smith,DW(1993)。 1992年的《能源政策法》-成为电力批发市场竞争的分水岭。 耶鲁J. ,10,447。

[2] Pollitt, M. G. (2019). The European single market in electricity: an economic assessment. Review of Industrial Organization, 55(1), 63–87.

[2] Pollitt,MG(2019)。 欧洲电力单一市场:经济评估。 工业组织评论55 (1),63–87。

[3] Bunn, D. W. (2004). Modelling prices in competitive electricity markets.

[3] Bunn,DW(2004)。 在竞争激烈的电力市场中对价格进行建模。

[4] https://energyanalyst.co.uk/an-introduction-to-electricity-price-forecasting/

[4] https://energyanalyst.co.uk/an-introduction-to-electricity-price-forecasting/

[5] Kuo, P. H., & Huang, C. J. (2018). An electricity price forecasting model by hybrid structured deep neural networks. Sustainability, 10(4), 1280.

[5] Kuo,PH,&Huang,CJ(2018)。 基于混合结构深度神经网络的电价预测模型。 可持续性10 (4),1280。

[6] Wang, S., Wang, X., Wang, S., & Wang, D. (2019). Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. International Journal of Electrical Power & Energy Systems, 109, 470–479.

[6] Wang,S.,Wang X.,Wang,S.,&Wang,D.(2019)。 基于注意力机制和滚动更新的双向长短期记忆方法用于短期负荷预测。 国际期刊电力与能源系统 ,109,470-479

[7] Tian, C., Ma, J., Zhang, C., & Zhan, P. (2018). A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies, 11(12), 3493.

[7]田成,马俊,张成,詹占平(2018)。 基于长短期记忆网络和卷积神经网络的深度神经网络短期负荷预测模型。 能量11 (12),3493。

[8] Dalto, M., Matuško, J., & Vašak, M. (2015, March). Deep neural networks for ultra-short-term wind forecasting. In 2015 IEEE International Conference on Industrial Technology (ICIT) (pp. 1657–1663). IEEE.

[8] Dalto,M.,Matuško,J.和Vašak,M.(2015年3月)。 深度神经网络可用于超短期风能预报。 2015年IEEE工业技术国际会议(ICIT) (第1657年至1663年)。 IEEE。

[9] Ryu, S., Noh, J., & Kim, H. (2017). Deep neural network based demand side short term load forecasting. Energies, 10(1), 3.

[9] Ryu,S.,Noh,J.和Kim,H.(2017)。 基于深度神经网络的需求侧短期负荷预测。 能量10 (1),3。

[10] https://ai4impact/dld.html

[10] https://ai4impact/dld.html

[11] Celik, A. N., & Kolhe, M. (2013). Generalized feed-forward based method for wind energy prediction. Applied Energy, 101, 582–588.

[11] Celik,AN和Kolhe,M.(2013)。 基于广义前馈的风能预测方法。 应用能源 ,101,582-588。

[12] Kramer, O., Gieseke, F., & Satzger, B. (2013). Wind energy prediction and monitoring with neural computation. Neurocomputing, 109, 84–93.

[12] Kramer,O.,Gieseke,F.,&Satzger,B.(2013)。 风能预测和神经计算监测。 神经计算 ,109,84-93。

[13] Grassi, G., & Vecchio, P. (2010). Wind energy prediction using a two-hidden layer neural network. Communications in Nonlinear Science and Numerical Simulation, 15(9), 2262–2266.

[13] Grassi,G.和Vecchio,P.(2010)。 使用两层神经网络的风能预测。 非线性科学与数值模拟中的通信15 (9),2262-2266。

[14] Zhu, Q., Chen, J., Zhu, L., Duan, X., & Liu, Y. (2018). Wind speed prediction with spatio–temporal correlation: A deep learning approach. Energies, 11(4), 705.

[14]朱强,陈健,朱林,段旭,刘柳(2018)。 具有时空相关性的风速预测:一种深度学习方法。 能源11 (4),705。

[15] Parks, K., Wan, Y. H., Wiener, G., & Liu, Y. (2011). Wind energy forecasting: A collaboration of the National Center for Atmospheric Research (NCAR) and Xcel Energy (No. NREL/SR-5500–52233). National Renewable Energy Lab.(NREL), Golden, CO (United States).

[15] Parks,K.,Wan,YH,Wiener,G.,&Liu,Y.(2011)。 风能预测:国家大气研究中心(NCAR)和Xcel Energy(No. NREL / SR-5500–52233)的合作。 美国科罗拉多州戈尔登的国家可再生能源实验室(NREL)。

[16] Shi, H., Xu, M., & Li, R. (2017). Deep learning for household load forecasting — A novel pooling deep RNN. IEEE Transactions on Smart Grid, 9(5), 5271–5280.

[16] Shi,H.,Xu,M.,&Li,R.(2017)。 用于家庭负荷预测的深度学习-一种新颖的深度RNN池。 IEEE Transactions on Smart Grid9 (5),5271–5280。

[17] Cao, Q., Ewing, B. T., & Thompson, M. A. (2012). Forecasting wind speed with recurrent neural networks. European Journal of Operational Research, 221(1), 148–154.

[17] Cao Q.,Ewing,BT和Thompson,MA(2012)。 使用递归神经网络预测风速。 欧洲运筹学杂志221 (1),148–154。

[18] Sfetsos, A. (2000). A comparison of various forecasting techniques applied to mean hourly wind speed time series. Renewable energy, 21(1), 23–35.

[18] Sfetsos,A.(2000)。 应用于平均风速时间序列的各种预测技术的比较。 可再生能源21 (1),23–35。

[19] Zhao, X., Jiang, N., Liu, J., Yu, D., & Chang, J. (2020). Short-term average wind speed and turbulent standard deviation forecasts based on one-dimensional convolutional neural network and the integrate method for probabilistic framework. Energy Conversion and Management, 203, 112239.

[19] Zhao X.,Jiang,N.,Liu,J.,Yu,D.,&Chang,J.(2020)。 基于一维卷积神经网络和概率框架集成方法的短期平均风速和湍流标准差预测。 能源转换和管理 ,203,112239。

[20] Kim, J., Moon, J., Hwang, E., & Kang, P. (2019). Recurrent inception convolution neural network for multi short-term load forecasting. Energy and Buildings, 194, 328–341.

[20] Kim,J.,Moon,J.,Hwang,E.,&Kang,P.(2019)。 递归初始卷积神经网络用于多短期负荷预测。 能源与建筑 ,194,328-341。

[21] Hu, Y. L., & Chen, L. (2018). A nonlinear hybrid wind speed forecasting model using LSTM network, hysteretic ELM and Differential Evolution algorithm. Energy conversion and management, 173, 123–142.

[21]胡亚兰,陈陈(2018)。 基于LSTM网络,滞回ELM和差分进化算法的非线性混合风速预测模型。 能源转换和管理 ,173,123-142。

[22] Gensler, A., Henze, J., Sick, B., & Raabe, N. (2016, October). Deep Learning for solar power forecasting — An approach using AutoEncoder and LSTM Neural Networks. In 2016 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 002858–002865). IEEE.

[22] Gensler,A.,Henze,J.,Sick,B.,和Raabe,N.(2016年10月)。 太阳能预测的深度学习-一种使用AutoEncoder和LSTM神经网络的方法。 在2016年IEEE系统,人与控制论(SMC)国际会议上 (pp。002858–002865)。 IEEE。

[23] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

[23] I. Goodfellow,Y。Bengio和A. Courville,(2016年)。 深度学习 。 麻省理工学院出版社。

[24] https://developer.oracle/databases/neural-network-machine-learning.html

[24] https://developer.oracle/databases/neural-network-machine-learning.html

[25] Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks, 1(2), 119–130.

[25] Fukushima,K。(1988)。 Neocognitron:能够视觉模式识别的分层神经网络。 神经网络1 (2),119–130。

[26] LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional networks and applications in vision. In Proceedings of 2010 IEEE international symposium on circuits and systems (pp. 253–256). IEEE.

[26] LeCun,Y.,Kavukcuoglu,K.和Farabet,C.(2010年5月)。 卷积网络及其在视觉中的应用。 在2010 IEEE会议论文集的电路和系统国际研讨会上 (第253-256页)。 IEEE。

[27] Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

[27] Oord, AVD, Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 .

[28] Alves, T., Laender, A., Veloso, A., & Ziviani, N. (2018, December). Dynamic prediction of icu mortality risk using domain adaptation. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 1328–1336). IEEE.

[28] Alves, T., Laender, A., Veloso, A., & Ziviani, N. (2018, December). Dynamic prediction of icu mortality risk using domain adaptation. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 1328–1336). IEEE.

[29] Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM.

[29] Gers, FA, Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM.

[30] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

[30] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning . MIT press.

[31] https://www.rte-france/en/eco2mix/eco2mix-telechargement-en

[31] https://www.rte-france/en/eco2mix/eco2mix-telechargement-en

[32] Moon, J., Park, S., Rho, S., & Hwang, E. (2019). A comparative analysis of artificial neural network architectures for building energy consumption forecasting. International Journal of Distributed Sensor Networks, 15(9), 1550147719877616.

[32] Moon, J., Park, S., Rho, S., & Hwang, E. (2019). A comparative analysis of artificial neural network architectures for building energy consumption forecasting. International Journal of Distributed Sensor Networks , 15 (9), 1550147719877616.

[33] https://medium/@linminhtoo/forecasting-energy-consumption-using-neural-networks-xgboost-2032b6e6f7e2

[33] https://medium/@linminhtoo/forecasting-energy-consumption-using-neural-networks-xgboost-2032b6e6f7e2

[34] Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning. In Advances in neural information processing systems (pp. 3909–3917).

[34] Hadfield-Menell, D., Russell, SJ, Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning. In Advances in neural information processing systems (pp. 3909–3917).

Supplementary Materials

Supplementary Materials

Supplementary materials can be found here: https://github/kristoforusbryant/energy_production_forecasting/.

Supplementary materials can be found here: https://github/kristoforusbryant/energy_production_forecasting/ .

翻译自: https://medium/@kristoforusbryant/energy-production-forecasting-using-deep-neural-networks-and-a-contract-aware-loss-df6b764097b7

多层感知机 深度神经网络

本文标签: 神经网络深度多层产量损失