Complete one lesson every day to keep the streak going.
Su
Mo
Tu
We
Th
Fr
Sa
You earned a Free Pass!
Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.
You now have a complete specification for how to construct a knowledge graph. Join me in this final lesson where you'll define the tools that will execute the construction plan as specified by the workflow of agents. With a graph construction plan and a knowledge extraction plan in place, you are now ready to build a graph. In this lesson, you will dive into the details of the Knowledge Graph Construction Tool. If you're feeling generous, you can call this a neuro-symbolic agent, a hybrid of language models with rule-based systems. Let's take a step back for a moment to look at what these plans will produce. The Graph Construction Plan will load CSV files to produce the Domain Graph. There's almost a 1 to 1 mapping from CSV file to node type, with some extra mapping to create relationships. You could probably ask an LLM with a giant context window to perform this task, but this is a very direct mechanical process you have converged to code. The work will be performed by a single tool you'll define called the construct domain graph. That tool will have a bunch of helper functions that it uses for all the different parts of constructing the graph. So we'll go through each of those tools one step at a time. As always, we'll import some libraries. Check that OpenAI is running. And also make sure Neo4J is still there. Perfect. Now let's start defining the tools themselves. For domain graph construction, we're going to create a series of tools, each one with very specific responsibility. You'll start with creating a uniqueness constraint on the database. Now that we're moving from just thinking about the data files, we have to think about preparing the database for the data import that's about to happen. And the first part about preparing the database is to make sure that we have uniqueness constraints. Now in the database, that ends up meaning that for any nodes that have a particular label, we can say that whenever that label occurs and there's a property on that label, that we want that property to be unique. That correlates exactly with the CSV file. that have a unique column ID. That's going to be a unique constraint for each one of those. So this utility function is going to take care of that for us. You're just going to pass in the label and what the unique_property_key should be, and it'll take care of actually creating the constraint on Neo4J. Now the Cypher query down in here, you can see CREATE CONSTRAINT, the constraint name that gets passed in, only create it if it doesn't already exist. And it gets created for a particular label, and on that label we require that this unique property key will be unique. Now, before we were using a lot of query parameters when we're setting up constraints like this instead of Neo4J, it's not possible to pass in a query parameter for that. So we're going to be doing a little bit of unsafe work here by actually doing string concatenation. And we know that that's not really a recommended practice. We should have a sanitized function in here to make sure there's nothing happening that's, you know, bad. But for now, this will be just fine. So the next function you want, if you have the ability to actually create uniqueness constraints for nodes, we then want to be able to load nodes from a CSV file. You can define a function for that. So here's the load_nodes_from_csv function. It takes in a source_file, the label for that source file, which of the columns in that CSV file is unique, and then also a list of all the properties that should be created from the source file. Now looking down here at the Cypher that's created, there's a special Cypher syntax for loading from CSV files that are in the import directory. So that's why everything that we were working with were all relative paths. Here when we're using LOAD CSV in Neo4J, this file URL that's here is going to end up being relative to Neo4J's import directory. So we're going to load a CSV file with headers from that directory. and then for every row, in that file, we're going to call a sub query here. And in the sub query, it's actually going to go ahead and do the merging of those rows. MERGE of course is going to do a creation by first seeing if that node already exists. And if it doesn't, it'll go ahead and create that node based on the unique column that we're passing in. And then this next line here, the FOREACH is a little bit of unique way of taking advantage of the property list that was passed in. It's really just the names of some properties. And what we end up doing is looping through each of those properties, and for each of those property names, we're going to set the property value on the node that was just created to the row value that was passed in from the CSV file. So in effect, we're going to loop through all the names and go ahead and just iteratively set those. values here. Now by using this inside of a subquery, we then get the option of doing this in batch. So we're going to do this IN TRANSACTIONS OF 1000 ROWS. So no matter how big the CSV file is, we're going to do it as a batch of just 1,000 at a time. You can then see what we're actually going to call to Neo4J, sending in the query. We're going to pass that query in along with a bunch of query parameters that will get used. Now for each CSV file that you have, you actually want to do both of the functions that we just defined. We want to call both of them. First call the creating a uniqueness constraint to make sure that we'll have unique values for each node, so the IDs are respected. And then after we've created a uniqueness constraints for each of the CSV files, we're then going to go ahead and actually do the import using the load_nodes_from_csv tool that we just saw. So that's what's happening here inside of import_nodes. It just taking the construction rule, and then first based on the values from the construction rule, it's going to go ahead and call the create uniqueness constraints tool right here. And then if it doesn't have an error, it'll go on and do the import from the nodes from the CSV file. And all it's doing here is just taking from the construction rule, each of the different values that need to be passed into the function. You can do something similar with importing relationships. Now in relationships, we don't need to apply a uniqueness constraint on those because they're not going to have any identities themselves. They're going to be connected directly to nodes on either side, either the from or the to node. And those should be unique. But the relationship themselves are going to be unique simply because there's just one of each for every row in the files that we're importing. This is a little particular to the the data files that we have here. We'd want to take a little bit more care if we had, you know, data files that were set up in some more complicated way, but for the data files that we have in this example, this is an adequate way to actually do the import. As with the node loading, we're going to use LOAD CSV WITH HEADERS from the import directory and load all those as rows. And then for each of those rows, we're going to actually do a little bit more work inside of the sub query. First, we're going to find the existing nodes that were previously loaded. So we're going to find the from_node and we're going to find the to_node. And after we've found those using a MATCH clause, we'll use the MERGE clause here to actually create the relationship from the from_node to the to_node. So this is only going to allow us to create one relationship for any given pair of a from and a to of a particular type of relationship. So if there are two relationships that were, say there are two ABK likes coffee rows, That would only happen once because even if there were two that occurred in the CSV file, by merging it here, the merge will look at the nodes. Those are already unique on either side. And the relationship already has an ABK likes, so the likes is already there. It's not going to create another relationship. And same thing with the nodes. We're going to do the property setting on the relationships if there were any properties that are available. We'll loop through them using this FOREACH construct here, and that'll set all the values that are available. And then just go ahead and call the query, passing in all the query parameters directly from the construction rule itself. So all of those functions do most of the real work, and then construct_domain_graph itself really doesn't have to do a whole lot other than to take in as an argument the full construction_plan and then do the right ordering of constructing all the nodes first, and then now with the nodes already existing, then you can go ahead and create the relationships. With construct_domain_graph already defined, we can go ahead and give it a try. Now to run this, of course, we're going to need a complete construction plan. So here's the whole construction plan. This will be different from previous lessons depending on what the LLM decided to do. When this notebook was created, here's the construction plan that was valid. and you can take a look through that. It's it's kind of expected. I think the thing to pay most attention to here that we'll check later is the relationships that are created. There is a contains relationship that goes from a Product, contains an Assembly. That sounds right. And the IS_PART_OF goes from a part, is a part of an assembly. Okay, that's a little wordy, but that's that's correct. And the from node here is a part that is supplied by a supplier. Also seems perfectly fine. So that will be the input to our construct_domain_graph function. So, let's give it a run. Now, when that function is done, you don't see any output. could have added a print statement or something to let you know that okay, it's finished doing its work. Instead, what we're going to do here is actually interact with the graph as if we were still doing some querying. And so let's inspect the domain graph with a bit of a fancy Cypher statement. What we're going to do is, first find all the just relationship construction rules from the construction plan and we're going to do that with just a list comprehension expression inside of Python. So we're just going to loop through all the construction plan values and whenever the construction_type is relationship, it gets put into this list. So here's all the relationship constructions that we saw before, but without the node constructions. And then we're going to do a little bit of Cypher magic here. And so I'm going to walk through this one step at a time. In this fancy Cypher query we're about to run, the goal is basically to do a little sampling of the graph. So we're not going to get the entire graph to take a look at, but what we want to do is we're expecting that for every relationship construction rule that was up here, there should at least be one occurrence of that relationship type in the resulting graph. So if we just get a collection of all the different relationship types and just do pattern matching on those, and then see the output of that, we should see that all the things we expected to be created were actually created. Hmm, I'm not sure if I said that correctly, but let's actually walk through it and see how this plays out. So, here's the first step. We're going to pass in this list of construction rules and in Cypher, you have the ability to take a list of things and UNWIND it into a collection of rows. So what was a list of rules as a single item, will be turned into multiple. rows where each element in each row will have a single rule. So UNWIND list basically does that. So the relationship constructions list will be turned into a series of rows where each row element will just have a single construction value in it. So then with those, what we want to end up doing is for the construction rule, we want to do a pattern match from some node and we don't have any qualification for it here. We're going to match from one node through a relationship that we're going to call r, and the relationship type is what lets actually going to use from the construction rule, we're going to pull out the construction rules relationship type value. And here, of course, the dollar sign, we're referring to the query parameter. So we're going to go ahead and pull that out into this relationship type here. And so if we had up here, You'll see that we've got a supplied by relationship type. That will end up getting substituted in here by the query parameter. So we'll have a from some node that is supplied by some other node. And what we'll do is return just the labels of the from node and the labels of the to node and then the type of the relationship itself. What we should end up with is a little triple of just the relationship labels and the node labels on either side. And we're going to limit it to just one because we don't want to see the whole graph. We just want to verify that this exists. We're going to put that inside of a subquery so that we can apply the match there just once to every element that we've unwound earlier. So that's what's happening here in this part. We're going to call a subquery and this match_one_path that we've just defined. can be put inside there. And then in the fully assembled cypher, we're going to unwind the list, do the matching inside of the subquery, and then from all of that return just the triples of the fromNode, relationship, and toNode, which would be the labels, their types, and the labels on the toNode as well. Okay, and you can see that this is going to go ahead and print out the value. So you can see what the cypher looks like before we run it, and then we'll finally run the cypher. So here's the complete cypher statement. And here are the relationships that we have found. We've got from Product contains Assembly, and a Part is part of an Assembly. and a Part is supplied by a Supplier. Perfect. That matches what we saw earlier in the construction rules. So it looks like this graph has been constructed in a nice way.