admin管理员组文章数量:1173631
I've been working with convnetjs for 1 year and now I want to move on to more powerful and fast libraries. I thought Tensorflow would be orders of magnitude faster than a JS library, so I wrote a simple neural network for both libraries and did some tests. It is a 3-5-5-1 neural network, trained on one single example for a certain number of epochs with SGD and RELU layers.
Tensorflow code:
import tensorflow as tf
import numpy
import time
NUM_CORES = 1 # Choose how many cores to use.
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=NUM_CORES, intra_op_parallelism_threads=NUM_CORES))
# Parameters
learning_rate = 0.001
training_epochs = 1000
batch_size = 1
# Network Parameters
n_input = 3 # Data input
n_hidden_1 = 5 # 1st layer num features
n_hidden_2 = 5 # 2nd layer num features
n_output = 1 # Data output
# tf Graph input
x = tf.placeholder("float", [None, n_input], "a")
y = tf.placeholder("float", [None, n_output], "b")
# Create model
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.nn.relu(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) #Hidden layer with RELU activation
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2'])) #Hidden layer with RELU activation
return tf.matmul(layer_2, _weights['out']) + _biases['out']
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_output]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_sum(tf.nn.l2_loss(pred-y)) / batch_size # L2 loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
sess.run(init)
# Training Data
train_X = numpy.asarray([[0.1,0.2,0.3]])
train_Y = numpy.asarray([[0.5]])
# Training cycle
start = time.clock()
for epoch in range(training_epochs):
# Fit training using batch data
sess.run(optimizer, feed_dict={x: train_X, y: train_Y})
end = time.clock()
print end - start #2.5 seconds -> 400 epochs per second
print "Optimization Finished!"
JS code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Regression example convnetjs</title>
<script src=".js"></script>
<script src=".js"></script>
<script>
var layer_defs, net, trainer;
function start() {
layer_defs = [];
layer_defs.push({ type: 'input', out_sx: 1, out_sy: 1, out_depth: 3 });
layer_defs.push({ type: 'fc', num_neurons: 5, activation: 'relu' });
layer_defs.push({ type: 'fc', num_neurons: 5, activation: 'relu' });
layer_defs.push({ type: 'regression', num_neurons: 1 });
net = new convnetjs.Net();
net.makeLayers(layer_defs);
trainer = new convnetjs.SGDTrainer(net, { learning_rate: 0.001, method: 'sgd', batch_size: 1, l2_decay: 0.001, l1_decay: 0.001 });
var start = performance.now();
for(var i = 0; i < 100000; i++) {
var x = new convnetjs.Vol([0.1, 0.2, 0.3]);
trainer.train(x, [0.5]);
}
var end = performance.now();
console.log(end-start); //3 seconds -> 33333 epochs per second
var predicted_values = net.forward(x);
console.log(predicted_values.w[0]);
}
</script>
</head>
<body>
<button onclick="start()">Start</button>
</body>
</html>
The results are that convnetjs trains for 100'000 epochs in 3 seconds, while Tensorflow trains for 1000 epochs in 2.5 seconds. Is this expected?
I've been working with convnetjs for 1 year and now I want to move on to more powerful and fast libraries. I thought Tensorflow would be orders of magnitude faster than a JS library, so I wrote a simple neural network for both libraries and did some tests. It is a 3-5-5-1 neural network, trained on one single example for a certain number of epochs with SGD and RELU layers.
Tensorflow code:
import tensorflow as tf
import numpy
import time
NUM_CORES = 1 # Choose how many cores to use.
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=NUM_CORES, intra_op_parallelism_threads=NUM_CORES))
# Parameters
learning_rate = 0.001
training_epochs = 1000
batch_size = 1
# Network Parameters
n_input = 3 # Data input
n_hidden_1 = 5 # 1st layer num features
n_hidden_2 = 5 # 2nd layer num features
n_output = 1 # Data output
# tf Graph input
x = tf.placeholder("float", [None, n_input], "a")
y = tf.placeholder("float", [None, n_output], "b")
# Create model
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.nn.relu(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) #Hidden layer with RELU activation
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2'])) #Hidden layer with RELU activation
return tf.matmul(layer_2, _weights['out']) + _biases['out']
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_output]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_sum(tf.nn.l2_loss(pred-y)) / batch_size # L2 loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
sess.run(init)
# Training Data
train_X = numpy.asarray([[0.1,0.2,0.3]])
train_Y = numpy.asarray([[0.5]])
# Training cycle
start = time.clock()
for epoch in range(training_epochs):
# Fit training using batch data
sess.run(optimizer, feed_dict={x: train_X, y: train_Y})
end = time.clock()
print end - start #2.5 seconds -> 400 epochs per second
print "Optimization Finished!"
JS code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Regression example convnetjs</title>
<script src="http://cs.stanford.edu/people/karpathy/convnetjs/build/convnet.js"></script>
<script src="http://cs.stanford.edu/people/karpathy/convnetjs/build/util.js"></script>
<script>
var layer_defs, net, trainer;
function start() {
layer_defs = [];
layer_defs.push({ type: 'input', out_sx: 1, out_sy: 1, out_depth: 3 });
layer_defs.push({ type: 'fc', num_neurons: 5, activation: 'relu' });
layer_defs.push({ type: 'fc', num_neurons: 5, activation: 'relu' });
layer_defs.push({ type: 'regression', num_neurons: 1 });
net = new convnetjs.Net();
net.makeLayers(layer_defs);
trainer = new convnetjs.SGDTrainer(net, { learning_rate: 0.001, method: 'sgd', batch_size: 1, l2_decay: 0.001, l1_decay: 0.001 });
var start = performance.now();
for(var i = 0; i < 100000; i++) {
var x = new convnetjs.Vol([0.1, 0.2, 0.3]);
trainer.train(x, [0.5]);
}
var end = performance.now();
console.log(end-start); //3 seconds -> 33333 epochs per second
var predicted_values = net.forward(x);
console.log(predicted_values.w[0]);
}
</script>
</head>
<body>
<button onclick="start()">Start</button>
</body>
</html>
The results are that convnetjs trains for 100'000 epochs in 3 seconds, while Tensorflow trains for 1000 epochs in 2.5 seconds. Is this expected?
Share Improve this question edited Dec 28, 2015 at 11:09 Amir 11.1k10 gold badges51 silver badges76 bronze badges asked Dec 27, 2015 at 11:15 okhokh 4701 gold badge4 silver badges9 bronze badges 1- Can you explain the attributes you use in convnet input layer? – Peter Commented Nov 3, 2019 at 19:19
4 Answers
Reset to default 15There could be many reasons why:
The data input is so small that most of the time is spent in just conversion between python and the C++ core, while the JS is just one language.
You are using only one core in Tensorflow while the JS could potentially leverage more than one
the JS library is able to create a highly optimized JIT version of the program.
The real benefits of Tensorflow will come when the distributed version will be public. Then the ability to run big networks on many nodes will be more important than the speed of a single node.
As for now(version 0.6), it doesn't matter if you use CPU or GPU for tensorflow, tensorflow will be slow on GPU also.
Here are corresponding benchmarks
Tensorflow may be slower than torch, convnetjs, etc on CPU due to:
- You may use non-optimized calculation graph.
- TF is not so mature as torch, convnetjs, etc. It's simply not so optimized. I hope yet.
According to rumors google doesn't care about optimization for a single machine. Bare in mind, that
3a) we live in the cluster age
3b) you can buy 57-core processor for 195$ (however I haven't tested if TF works with this hardware
3c) Here's what google says about their quantum computer. 100 million times faster than a conventional system.
TensorFlow is slower than caffe, torch, etc on GPU because of:
- TF (as for 0.6) doesn't fully support cuda 7.5.
TF (as for 0.6) doesn't support cudnn v3 and cudnn v4.
and this makes TF 0.6 several orders slower than its competitors on 'machine learning desktops/amateurs'.
However, there is an issue to address cuda 7.5 and cudnn v3. Yet, it's closed as duplicated with another issue, which is much less concrete (IMHO). The latter issue, which is still opened, doesn't obligate to support cuda 7.5 and cudnn v3/v4(yep, I'm a pessimist).
So, we can only either
- Hope and wait for google to solve these issues(add cuda 7.5 and cudnn v3/v4 support and keep TF up to date all the way)
- Contribute. As TF is opensource. Or wait somebody to contribute :)
I had the same confusion as an author of this question has. I hope my answer helped.
Yes, for tiny models this is expected.
Tensorflow is not optimized for tiny neural nets with single item batches because making that regime faster is a waste of time. Those models are not expensive so there is no point. If you made the minibatch size larger (64 cases maybe) and the model somewhat larger (hundreds of hidden units) I would expect tensorflow to be much faster when compared to the other library.
Imagine implementing the neural net naively in python using numpy. A naive numpy implementation would also be slow for this model.
The problem might be with your loss function. Why not try this instead?
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
本文标签: javascriptWhy is Tensorflow 100x slower than convnetjs in this simple NN exampleStack Overflow
版权声明:本文标题:javascript - Why is Tensorflow 100x slower than convnetjs in this simple NN example? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1737623484a1999273.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论