add support for google spanner graph #622

DataBoyTX · 2024-12-21T02:23:19Z

Add Google Spanner GQL integration and demo notebooks showing how to connect to google spanner graph database, and visualize the results

graphistry/database_clients/SpannerGraph.py

graphistry/PlotterBase.py

graphistry/database_clients/SpannerGraph.py

lmeyerov · 2024-12-24T20:25:16Z

graphistry/pygraphistry.py

+        Returns:
+            PlotterBase: A PlotterBase instance configured with SpannerGraph.
+        """
+        return Plotter().spannergraph(project_id, instance_id, database_id)


re:interface --

Connection

i wonder if we want something similar to the bolt connectors, where we can do streamlined auth + passing in a client

CONN_CFG = { ... } g1 = graphistry.spanner_connect(**CONN_CFG) g2 = g1.spanner_query("...")

native_client = google.spanner.client(...) g1 = graphistry.spanner_connect(native_client) g2 = g1.spanner_query("....")

Return-typed query methods

Separately, a pattern I'm finding to be type-friendly with remote-mode gfql has been separating return-shape-typed methods:

g2 = g1.spanner_g("get a graph") df = g1.spanner_df("get a table"")

Also, in the neo4j & gremlin cases, when building real examples:

hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db

stored procedures: does GSQL have some notion of passing inputs

reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates

Part of this makes me want to have some bigger examples to motivate helper functions on top. I suspect query_g() and query_df() can be good starts, but we need these examples to make more realistic

(The lack of bulk reads/writes such as via arrow is a bit puzzling, but that can wait?)

I added CONF to register

hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db

I plan to take the two additional demo datasets that were shared (transit data, and FinVest) and create notebooks for those next, after that, I will work on the hydration from some other datasets we have

stored procedures: does GSQL have some notion of passing inputs

Yes, I believe so. But need to check syntax https://cloud.google.com/spanner/docs/reference/standard-sql/graph-query-statements

reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates

agreed

Found an example for parameterized queries: graph_snippets.py will have to look at adding this in v2

lmeyerov · 2025-01-16T07:05:13Z

@DataBoyTX looks like ci issues?

lmeyerov

made a quick pass, see comments

lint, types, e.g., missing type signatures
public interface should be 'fluent' wrt plottable, so PlotterBase::spanner_gsql(self: Plottable, query: str... , using the passed in plottable instead of a fresh graphistry.edges(...)
remove stray top-level logging.setLevel
move remaining spanner top-level import into method bodies
rename to spanner_gql as spanner can be sql too?

Ideally but I get if hard in this pr, we can do a fast-follow next week?

where possible, better to do nested types like List[Dict[... or List[Union[Kind1, ...
unit tests on the json2df
document df cols in pydocs where relevant

probably good Help Wanted item:

infer source/dest name from passed in plottable or remote graph, vs current hard-coding

graphistry/plugins/spannergraph.py

graphistry/pygraphistry.py

lmeyerov · 2025-01-16T07:39:25Z

graphistry/pygraphistry.py

+    @staticmethod
+    def spanner_query(query: str, params: Dict[str, Any] = {}) -> Plottable:    
+        # TODO(tcook): add pydocs 
+        return Plotter().spanner_query(query, params)


yeah this is where i think we want 2 entrypoints:

pygraphistry with a static method

plotterbase with a chained method

in both case, the spanner_query gets a self: Plottable

I think it's coded and working as expected now wrt to this, can you please confirm this is what you were expecting?

graphistry/pygraphistry.py

demos/demos_databases_apis/spanner/google_spanner_finance_graph.ipynb

lmeyerov

see comments wrt ipynb

DataBoyTX · 2025-01-20T04:36:10Z

@DataBoyTX looks like ci issues?

@lmeyerov - think everything is passing, can you re-review pls?

DataBoyTX added 2 commits December 20, 2024 20:21

(feature): new support for Google Spanner Graph

f6ac839

renamed spannergraph file

fbfff6c