understand tensorflow graph

To understand tensorflow computation graph, you need to know how to print or plot it. If you can visualize the computation graph, it would be easy to understand it. The following code is used to dump the current computation graph to a log file:

import tensorflow as tf
from datetime import datetime
now=datetime.now().strftime("%Y%m%d%H%M%S")
root_logdir="tf_logs"
logdir="{}/run-{}/".format(root_logdir,now)
file_writer=tf.summary.FileWriter(logdir,tf.get_default_graph())

The construction function of FileWriter will create the log directory(if not existed) and store the default computation graph in a log file named events.out.tfevents.xxx in that directory. Later, you can launch the tensorboard command specifying the log dir:

tensorboard --logdir tf_logs/

The command runs a http server on your computer listening on port 6006. You can visit the address:http://yourpcname:6006 to see the computation graph in a browser. The browser actually displays all the information stored(using the file_writer.add_summary() function) in the log files visually, including the computation graph. Unfortunately, the computation graph is empty in some cases like the following one:

tf.constant(3.0,name="e")

This statement is supposed to create a node in the computation graph, but you cannot see it in the browser. In such cases, you can print the graph directly in your python program:

print(tf.get_default_graph().as_graph_def())

The output would be:

node {
  name: "e"
  op: "Const"
  attr {
    key: "dtype"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "value"
    value {
      tensor {
        dtype: DT_FLOAT
        tensor_shape {
        }
        float_val: 3.0
      }
    }
  }
}
versions {
  producer: 24
}

When you call some tensorflow functions such as tensorflow.constant() or tensorflow.add(), or create instances of some tensorflow classes such as tensorflow.Variable, you will add new nodes to the computation graph. More commonly, new nodes are added by tensorflow operators such as +,*, etc. These operators are unlike similar python operators or numpy operators which do the calculation immediately, but rather define the computation graph. We do not begin to introduce the computation graph with a complex expression like x*x*y+y+2, which will generate a graph complex enough to confuse the beginners, and the name “x”,”y” are used to confuse the reader intentionally because these names are used by tensorflow for its own aim. Let’s begin with a simple example:

tf.add(2,3,name="myadd")

This creates a node in the computation graph:

The right box contains the description about this node: its name is “myadd” as specified as the name parameter of the add function; the operation is “Add” which is defined internally by tensorflow; it has two inputs named “myadd/x” and “myadd/y”. Where do the names come? We only provide two python constant 2,3 as the first and second parameter of the tf.add function, not give their names. In fact, the two names are generated automatically for those unnamed inputs: the part before / is the node (myadd) itself, the part after / is x,y,…, according to their order in the parameters. The node has not an output even you write the following code:

z=tf.add(2,3,name="myadd")

The following the two nodes corresponding to 2 and 3:

I wonder why it shows no output as we see their output is connected to myadd.

What is z? z is a tensor.

Now we consider a little more complex example:

x=tf.constant(3.0,name="e")
z=x+1

The tensorflow operator + created a new node in the computation graph: its name is determined by tensorflow automatically as “add”; its Operation is “Add”; it has two inputs e and “add/y”(corresponding to 1 as we didn’t give a name to it).

However, the node still has no output. How do we get an output? In fact, the output is shown only when it is connected to another node like the following one:

x=tf.constant(3.0,name="e")
z=x+1
b=z+2

Now, the second + generated another node, and the output of the “add” node is used as one of the inputs of this new node. “add” now has an output “add_1” which is the name of the new node.

The new node is named “add_1” by tensorflow because it has another “add” node so it needs a number suffix to differentiate them. The second input of “add_1” is automatically named after “add_1/y” corresponding to 2 that was not given a name by us.

What about the following example?

x=tf.constant(3.0,name="e")
z=x*2+1

The two operators * and + in one expression create two nodes in the graph. It is important to note that here only the tensorflow operator *,+ create nodes while the python = operator does not. So let’s consider the following example:

x=tf.constant(3.0,name="e")
x=x+1

What do you think about the node(s) in the graph? Is the output of the “add” node is connected back to itself?

Actually, it is not! Note that x=tf.constant(3.0,”e”) does not mean x is a constant python variable. So the next line x=x+1 is totally legal. x is just the output tensor of a tensorflow constant operation(Tensor(“e:0”, shape=(), dtype=float32)). After x=x+1, x becomes a new variable which does not refer to the same memory as the old x: the output tensor of the add operation(Tensor(“add:0”, shape=(), dtype=float32)). It seems we lost the reference to the old x and the memory allocated to the old x may be freed. But this is not true. The old x(the output tensor of the constant operation) still exists. You can use x.op.inputs[0] to get it as it becomes the first input of the add operation. The following code will create 2 Add nodes:

x=tf.constant(3.0,name="e")
x=x+1
x=x+1

Since tf.constant does not make x a constant python variable, it seems we can use the following code to get a changing x:

x=tf.constant(3.0,name="e")
sess=tf.Session()
for i in range(5):
    result=sess.run(x)
    print(result)
    x=x+1
sess.close()

And the output is right:

3.0
4.0
5.0
6.0
7.0

But this is a horrible code as it will create multiple x in memory, and even worse, it will create multiple nodes(operations) in computation graph:

This is where tensorflow Variables come in handy. Of course you cannot just replace tf.constant with tf.Variable as follows:

x=tf.Variable(3,name="myvar")
sess=tf.Session()
sess.run(x.initializer)
#sess.run(y.initializer)
for i in range(5):
    result=sess.run(x)
    print(result)
    x=x+1
sess.close()

That code still creates a bunch of useless nodes in the graph. We should use the assign operation instead:

x=tf.Variable(3,name="myvar")
assignop=tf.assign(x,x+1)
sess=tf.Session()
sess.run(x.initializer)
for i in range(10):
    print(sess.run(x))
    result=sess.run(assignop)
sess.close()

Now, whatever the number of loop times is, there seem only 4 nodes on the graph: myvar(x), add(x+1), y(1), Assign.

The Assign operation takes 2 inputs: the output tensor of x+1 and x itself. Note the yellow edge connecting Assign to myvar, it is called a reference edge(“Edge showing that the output operation node can mutate the incoming tensor”, here the output operation node is the Assign node and the incoming tensor is x as it is one of the input of the Assign operation) . The direction of the reference edge is contrary to an ordinary incoming edge. Each time you execute the Assign operation, the value of x+1 will be assigned to x. We need to look further into the Variable.

tf.Variable(3,name="myvar")

The above line of code alone will produce the following graph:

It looks like we’ve created a node named “myvar” but in fact it is not a node. It is a namespace that contains multiple nodes. Double-click the namespace, you’ll see:

It says a tensorflow variable actually consists of 4 nodes: the core node is myvar/(myvar) which stores the value of the variable;the myvar/Assign node assigns the initial value to myvar/(myvar) and the myvar/initial_value node provides the initial value; when you need the value of the variable,e.g., as the input of another operation, you need to get its value via a myvar/read operation rather than directly accessing it. The following are the operations that comprise myvar and interact with other operations:

Note that the VariableV2 operation myvar/(myvar) has 3 outputs although two of them are reference edges pointing to itself(remember the direction of yellow edges is contrary to normal data-flow edges?) . The outputs of the VariableV2 operation are fed to myvar/Assign, myvar/read and Assign, respectively.

You may think it troublesome to use an assign operation to update a variable’s value. Can we just use the =?

x=tf.Variable(3,name="myvar")
x=4

Now, the tensorflow variable does not update its value to 4 but changes to an ordinary python integer variable containing the value 4. Of course you can use the code below to update x with a new tensorflow variable:

x=tf.Variable(3,name="myvar")
x=tf.Variable(4,name="myvar")

But that will create two variable nodes in the computation graph. So the only way to update a variable’s value is to use the assign operation. tf.assign(x,…) function equals to(calls) x.assign(…). That needs x to be a variable. Constants have no assign method thus can not be assigned a value. Another difference between tensorflow constant and variable is a variable must be explicitly initialized before used(as the input of other operation, or evaluated),i.e., session.run(x.initializer). x.initializer is the internal myvar/Assign operation. This is because this internal operation is not part of the dependency tree in the computation graph that is related to the variable. Other nodes connect to either the internal myvar/read operation, or the interal myvar/(myvar) operation. So the myvar/Assign node won’t be evaluated automatically in the running.

Until now, we feed the computation graph values from outside by giving initialization values to the variables/constants at their creation time. Do not try to feed them new values with = (that will change the variables/constants to ordinary python variables), or assign(that will create new unnecessary Assign nodes). To feed your computation graph with various values during running, you need to replace tensorflow variables/constants with placeholders.

x=tf.placeholder(tf.float32)

The above code will create a Placeholder node in the graph:

Note that you cannot find a method of x to set its value because it is meaningless to set its value until you run the graph. You need to feed its value when running/evaluating a node that is related to the placeholder through the feed_dict parameter of the run or eval function.