sparkkgml.kg
Classes
A class to represent and manipulate a Knowledge Graph (KG) from RDF data. |
Module Contents
- class sparkkgml.kg.KG(location: str, _is_remote: bool = False, fmt: str = None, skip_predicates: Set[str] = set(), sparkSession: pyspark.sql.SparkSession = None)
A class to represent and manipulate a Knowledge Graph (KG) from RDF data.
- location
The location of the RDF file, either local or remote (HTTP/HTTPS).
- Type:
str
- _is_remote
Indicates if the RDF source is remote. Defaults to False.
- Type:
bool
- fmt
The format of the RDF file (e.g., ‘xml’, ‘n3’, ‘turtle’). Defaults to None, which infers the format.
- Type:
str
- skip_predicates
A set of predicates to skip during RDF parsing.
- Type:
Set[str]
- key_to_vertex_hashMap
A hashmap mapping integer keys to vertices (subject/object).
- Type:
dict
- vertex_to_key_hashMap
A hashmap mapping vertices (subject/object) to integer keys.
- Type:
dict
- sparkSession
The Spark session to be used for DataFrame creation.
- Type:
SparkSession
- getKGasDataFrame()
Parses the RDF data and converts it into vertices and edges DataFrames compatible with GraphFrames.
- createKG()
Creates and returns a GraphFrame from the vertices and edges DataFrames, with an additional column indicating if a vertex has outgoing edges.
- location
- _is_remote
- fmt
- skip_predicates
- key_to_vertex_hashMap
- vertex_to_key_hashMap
- getKGasDataFrame()
Parses the RDF data from the specified location and converts it into two DataFrames: one for vertices and one for edges, which are compatible with GraphFrames.
- Returns:
A tuple containing the edges DataFrame and the vertices DataFrame.
- Return type:
tuple
- createKG()
Creates a GraphFrame from vertices and edges DataFrames with an additional column in the vertices DataFrame indicating whether a vertex has any outgoing edges.
- Returns:
A GraphFrame object created from the vertices and edges DataFrames.
- Return type:
GraphFrame