sparkkgml.kg

Classes

KG

A class to represent and manipulate a Knowledge Graph (KG) from RDF data.

Module Contents

class sparkkgml.kg.KG(location: str, _is_remote: bool = False, fmt: str = None, skip_predicates: Set[str] = set(), sparkSession: pyspark.sql.SparkSession = None)

A class to represent and manipulate a Knowledge Graph (KG) from RDF data.

location

The location of the RDF file, either local or remote (HTTP/HTTPS).

Type:

str

_is_remote

Indicates if the RDF source is remote. Defaults to False.

Type:

bool

fmt

The format of the RDF file (e.g., ‘xml’, ‘n3’, ‘turtle’). Defaults to None, which infers the format.

Type:

str

skip_predicates

A set of predicates to skip during RDF parsing.

Type:

Set[str]

key_to_vertex_hashMap

A hashmap mapping integer keys to vertices (subject/object).

Type:

dict

vertex_to_key_hashMap

A hashmap mapping vertices (subject/object) to integer keys.

Type:

dict

sparkSession

The Spark session to be used for DataFrame creation.

Type:

SparkSession

getKGasDataFrame()

Parses the RDF data and converts it into vertices and edges DataFrames compatible with GraphFrames.

createKG()

Creates and returns a GraphFrame from the vertices and edges DataFrames, with an additional column indicating if a vertex has outgoing edges.

location
_is_remote
fmt
skip_predicates
key_to_vertex_hashMap
vertex_to_key_hashMap
getKGasDataFrame()

Parses the RDF data from the specified location and converts it into two DataFrames: one for vertices and one for edges, which are compatible with GraphFrames.

Returns:

A tuple containing the edges DataFrame and the vertices DataFrame.

Return type:

tuple

createKG()

Creates a GraphFrame from vertices and edges DataFrames with an additional column in the vertices DataFrame indicating whether a vertex has any outgoing edges.

Returns:

A GraphFrame object created from the vertices and edges DataFrames.

Return type:

GraphFrame