-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for google spanner graph #622
base: master
Are you sure you want to change the base?
Conversation
graphistry/pygraphistry.py
Outdated
Returns: | ||
PlotterBase: A PlotterBase instance configured with SpannerGraph. | ||
""" | ||
return Plotter().spannergraph(project_id, instance_id, database_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re:interface --
Connection
i wonder if we want something similar to the bolt connectors, where we can do streamlined auth + passing in a client
CONN_CFG = { ... }
g1 = graphistry.spanner_connect(**CONN_CFG)
g2 = g1.spanner_query("...")
native_client = google.spanner.client(...)
g1 = graphistry.spanner_connect(native_client)
g2 = g1.spanner_query("....")
Return-typed query methods
Separately, a pattern I'm finding to be type-friendly with remote-mode gfql has been separating return-shape-typed methods:
g2 = g1.spanner_g("get a graph")
df = g1.spanner_df("get a table"")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in the neo4j & gremlin cases, when building real examples:
- hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db
- stored procedures: does GSQL have some notion of passing inputs
- reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates
Part of this makes me want to have some bigger examples to motivate helper functions on top. I suspect query_g()
and query_df()
can be good starts, but we need these examples to make more realistic
(The lack of bulk reads/writes such as via arrow is a bit puzzling, but that can wait?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I added CONF to register
hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db
I plan to take the two additional demo datasets that were shared (transit data, and FinVest) and create notebooks for those next, after that, I will work on the hydration from some other datasets we have
stored procedures: does GSQL have some notion of passing inputs
Yes, I believe so. But need to check syntax https://cloud.google.com/spanner/docs/reference/standard-sql/graph-query-statements
reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates
agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found an example for parameterized queries: graph_snippets.py will have to look at adding this in v2
@DataBoyTX looks like ci issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a quick pass, see comments
- lint, types, e.g., missing type signatures
- public interface should be 'fluent' wrt plottable, so
PlotterBase::spanner_gsql(self: Plottable, query: str...
, using the passed in plottable instead of a freshgraphistry.edges(...)
- remove stray top-level
logging.setLevel
- move remaining spanner top-level import into method bodies
- rename to
spanner_gql
as spanner can be sql too?
Ideally but I get if hard in this pr, we can do a fast-follow next week?
- where possible, better to do nested types like
List[Dict[...
orList[Union[Kind1, ...
- unit tests on the json2df
- document df cols in pydocs where relevant
probably good Help Wanted item:
- infer source/dest name from passed in plottable or remote graph, vs current hard-coding
graphistry/pygraphistry.py
Outdated
@staticmethod | ||
def spanner_query(query: str, params: Dict[str, Any] = {}) -> Plottable: | ||
# TODO(tcook): add pydocs | ||
return Plotter().spanner_query(query, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is where i think we want 2 entrypoints:
- pygraphistry with a static method
- plotterbase with a chained method
in both case, the spanner_query gets a self: Plottable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's coded and working as expected now wrt to this, can you please confirm this is what you were expecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments wrt ipynb
@lmeyerov - think everything is passing, can you re-review pls? |
Add Google Spanner GQL integration and demo notebooks showing how to connect to google spanner graph database, and visualize the results