Schema Graph Traversal Tutorial

Compiled: August 26, 2019 using schema bmeg_rc2

Graph traversal is the movement within a graph from one node (vertex) or edge to another. When doing graph traversal, there are two important things to keep in mind:

1) The structure and composition of the graph
2) The commands we use to move around within it

In order to use BMEG, doing graph traversal is an important step— without it you can not access the data you want. In this tutorial, we will look at an example BMEG schema graph and some of its vertices and edges, and we will define some of the key operations we need to use when we move within the schema.

Part 1: Understanding Schema Layout and Content

Take a few minutes to read some of the different vertices (nodes) and edges. When you seek data from BMEG, you will query from here, so being familiar with the graph is a good fist step.

The two main parts of the BMEG schema graph are the vertices and edges. Most of the data is stored within the vertices. And when we query BMEG, our goal is to acquire that data.

All the vertices have a common orientation that includes three main parts: - Gid - a global identifier of each vertex. The gid will often be the field where you start your traversal. - Label - tells us about the type of the vertex. Vertices of the same type have similar property keys and edge labels, and they form a logical group within the system.

Go here to see the current schema. When you click on a specific vertex, more information (including the three main elements of each vertex) will appear below the graph.

Part 2: Getting introduced to traversal operations

Before we jump into using gripql operations, it would be beneficial to look at an example code that is written to perform traversal of a BMEG schema graph. We will analyze the role of each operation. If you are familiar with the code and what it does, you can skip this part and move onto the next section.

Part 2.1: Understanding operations using code
p = []

# Start query at vertex and look for labels starting with Project

for row in G.query().V().hasLabel("Project"):

if row.data.project_id.startswith("CCLE"):

p.append(row.gid)
# Traverse through schema nodes

q = G.query().V(p).out("cases").as_("CASE").out("samples").out("aliquots").out("drug_response").as_("DRUGRESP").out("compounds").as_("COMP")

# Render node properties of interest (cell line, drug, EC50 value)

q = q.render(["$CASE._data.case_id", "$COMP._gid", "$DRUGRESP._data.ec50"])

When we look at this code, there are two main things to consider.

  1. The node and the edges we go through
  1. The operations we use to get there.

Operations:

##### Part 2.2: Understanding Traversal Commands

If you understand how the code above is able to query for and filter data, you can move into creating your own code to query BMEG for your own downstream analysis. This link takes you to a page with a list of different gripql commands you can use.

Here, I have grouped most of the commands in categories based on their functionality:

A. Commands we use to start a traversal: .V([ids]) .E([ids])

B. Commands we use to continue traversal through the graph; going in and out of nodes and edges:

.in_(), inV() .out(), .outV() .both(), .bothV() .inE() .outE() .bothE()

C. Commands that filter data:

.has() gripql.eq(variable, value) gripql.gt(variable, value) gripql.lt(variable, value) gripql.gte(variable, value) gripql.lte(variable, value) gripql.within(variable, value) gripql.contains(variable, value) gripql.and_([conditions]) gripql.or_([conditions]) gripql.not_(condition)

D. Commands that help us save, limit, render, put a range on, include and exclude information of outputs:

.as_(name) .select([names]) .limit(count) .skip(count) .range(start, stop) .fields([fields]) .render(template) .aggregate([aggregations])

E. Commands that help us aggregate data sets:

.gripql.term(name, field, size) .gripql.histogram(name, field, interval) .gripql.percentile(name, field, percents=[]) .count() .distinct([fields])

To learn in depth the definitions of the commands and what they do, please go to the gripql operations page right here.

To go back to the main tutorial, click this link.

If you have any questions, comments, or concerns, contact us through Gitter and/or GitHub.

Commands

.V([ids])

Start query from Vertex

O.query().V()

Returns all vertices in graph

O.query().V("vertex1")

Returns:

{"gid" : "vertex1", "label":"TestVertex", "data":{}}

.E()

Start query from Edge

O.query().E()

Returns all edges in graph

Filtering

.hasLabel

Select vertices of a particular type

O.query().V().hasLabel("Gene")

.has()

Filter elements using conditional statements

O.query().V().hasLabel("Gene").where(gripql.eq("symbol", "TP53"))

Conditions

Conditions are arguments to .has() that define selection conditions

gripql.eq(variable, value)

Returns rows where variable == value

.has(gripql.eq("symbol", "TP53"))

gripql.neq(variable, value)

Returns rows where variable != value

.where(gripql.neq("symbol", "TP53"))

gripql.gt(variable, value)

Returns rows where variable > value

.where(gripql.gt("age", 45))

gripql.lt(variable, value)

Returns rows where variable < value

.where(gripql.lt("age", 45))

gripql.gte(variable, value)

Returns rows where variable >= value

.where(gripql.gte("age", 45))

gripql.lte(variable, value)

Returns rows where variable <= value

.where(gripql.lte("age", 45))

gripql.in_(variable, value)

Returns rows where variable in value

.where(gripql.in_("symbol", ["TP53", "BRCA1"]))

gripql.contains(variable, value)

Returns rows where variable contains value

.where(gripql.in_("groups", "group1"))

Returns:

{"data" : {"groups" : ["group1", "group2"]}}

gripql.and_([conditions])

.where(gripql.and_( [gripql.lte("age", 45), gripql.gte("age", 35)] ))

gripql.or_([conditions])

.where(gripql.or_( [...] ))

gripql.not_(condition)

Output

.mark(name)

Store current row for future reference

O.query().V().mark("a").out().mark("b")

.select([names])

Output previously marked elements

.limit(count)

Limit number of total output rows

.offset(count)

Start return after offset

.render(template)

Render current selection into arbitrary data structure

Traversing the graph

.in_()

Following incoming edges. Optional argument is the edge label (or list of labels) that should be followed. If no argument is provided, all incoming edges.

.out()

Following outgoing edges. Optional argument is the edge label (or list of labels) that should be followed. If no argument is provided, all outgoing edges.

.both()

Following all edges (both in and out). Optional argument is the edge label (or list of labels) that should be followed.

.inE()

Following incoming edges, but return the edge as the next element. This can be used to inspect edge properties. Optional argument is the edge label (or list of labels) that should be followed. To return back to a vertex, use .in_ or .out

.outE()

Following outgoing edges, but return the edge as the next element. This can be used to inspect edge properties. Optional argument is the edge label (or list of labels) that should be followed. To return back to a vertex, use .in_ or .out

.bothE()

Following all edges, but return the edge as the next element. This can be used to inspect edge properties. Optional argument is the edge label (or list of labels) that should be followed. To return back to a vertex, use .in_ or .out

Aggregation

.aggregate()

Return aggregate counts of field. This can be run at the graph level, without using the .query() method to start a traversal, ie

O.query().hasLabel("Person").aggregate(gripql.term("test-agg", "name"))

Where test-agg is the name of the aggrigation, Person is the vertex label type and name is the field.

O.query().V("1").out().aggregate(gripql.histogram("traversal-agg", "age", 5))

Starts on vertex 1, goes out and then creates a histogram named traversal-agg across the age field in the Person vertices.

.count()

Return the total count of elements

.distinct()

Only return distinct elements. Argument can JSON path to define what elements are used to identify uniqueness.