-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data synth #560
base: master
Are you sure you want to change the base?
Data synth #560
Conversation
for _ in range(num_calls): | ||
if len(non_affiliated_people) > 1: | ||
caller, callee = non_affiliated_people.sample(n=2, replace=False)['phone_number'].values | ||
self.add_call_log(call_logs_df, caller, callee, start_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be nice to, ahead of time, determine some overlapping social networks, vs random connections
checkout igraph's sbm & forest fire game generators, that can dictate who connects to who for N samples
e.g., given N users and intent for E relns, I think you can get [(a,b), (x, y), ...]
from one of igraph's generators, and then turn those into call logs generator calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea will look into this
leader_calls = int(num_affiliated_calls * leader_call_percentage) | ||
gang_calls = num_affiliated_calls - leader_calls | ||
|
||
# Generate intra-gang calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neato
i think this is similar to above... maybe for each gang:
- generate a general social network within the gang
- inject the leader or hierarchy or cell structure you want
- add some random overlap between the gang, other gangs, and unaffiliated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(+ comment on burners)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea was thinking about how to do this wrt the intra-gang call logs , for example only a leader would likely call to another gangs leaders
|
||
return call_logs_df | ||
|
||
def generate_affiliated_call_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this modeling phenomena like burner phones vs regular? i'm not sure how that'd look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, this is gang affiliated call logs versus regular, burner would be interesting, not sure how to model that though , maybe like similar profiles like someone calls x,y,z numbers on main, but then realized he slipped up and switches to a burner and calls same numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think you'd want the distributions to mirror
so like your simulation secretly tracks who has what burner when, and has burners call one another most of the time, and occasionally the slipup of calling a main
df = pd.DataFrame(records) | ||
return df | ||
|
||
def generate_call_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a notion of tracked phone numbers?
- most foolks slooowly rotate the main phone
- some folks quickly rotate their burners, and are primarily used for calling burners? something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and burners are / aren't associated with people, e.g., sometimes known, sometimes not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea was thinking maybe I do something similar to the whereabouts where it has a date tied to the person at that address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, similar to the social network, i think either:
- prepick a distribution/network/etc ahead of time, and follow that
- do a per-person/community timeline 'simulation'
i think we want to guarantee that burners are active only for a period of time before retiring, so i can imagine some sort of simulation:
ACTION_CALL_BURNER=0
ACTION_CALL_MAIN=1
ACTION_DESTROY_BURNER=2
ACTION_NEW_BURNER=3
ACTION_REPLACE_PHONE=4
person_to_burners : Dict[PersonId, List[BurnerId]]
person_to_main_phone : Dict[PersonId, PhoneId]
for tick:
person = pick(person_to_burners.keys())
switch random():
case ACTION_CALL_BURNER:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit hard to see without viz etc but:
-
see comment on injecting social networks, vs picking random pairs that'll come out funny looking wrt graph
-
i didn't quite follow the split between people, entities (addresses, phones, ...), and records (call logs, criminal, ....) linking them together, maybe easier to see
Yea this was the next step was to figure this part out |
seperated the profile generation out into a seperate module that is now a faker factory
initial PR for people data synthesis