Skip to content

Part 1 Introduction

What is CO-CONNECT Tools?

carrot.cdm is a software package. The main component is the `carrot.module that contains python classes and tools for:

  • Common Data Model
  • Health Data Elements (person, condition_occurence...)
  • I/O
  • Command Line Interface
  • Various Tools

image-2.png

What are the Health Data Elements ?

Classes for controlling and handling the building of elements such as the person table.

from carrot.cdm.objects import Person
Person
carrot.cdm.objects.versions.v5_3_1.person.Person
person = Person()
person.get_field_names()
['person_id',
 'gender_concept_id',
 'year_of_birth',
 'month_of_birth',
 'day_of_birth',
 'birth_datetime',
 'race_concept_id',
 'ethnicity_concept_id',
 'location_id',
 'provider_id',
 'care_site_id',
 'person_source_value',
 'gender_source_value',
 'gender_source_concept_id',
 'race_source_value',
 'race_source_concept_id',
 'ethnicity_source_value',
 'ethnicity_source_concept_id']
person.get_field_dtypes()
{'person_id': 'Integer',
 'gender_concept_id': 'Integer',
 'year_of_birth': 'Integer',
 'month_of_birth': 'Integer',
 'day_of_birth': 'Integer',
 'birth_datetime': 'Timestamp',
 'race_concept_id': 'Integer',
 'ethnicity_concept_id': 'Integer',
 'location_id': 'Integer',
 'provider_id': 'Integer',
 'care_site_id': 'Integer',
 'person_source_value': 'Text50',
 'gender_source_value': 'Text50',
 'gender_source_concept_id': 'Integer',
 'race_source_value': 'Text50',
 'race_source_concept_id': 'Integer',
 'ethnicity_source_value': 'Text50',
 'ethnicity_source_concept_id': 'Integer'}
import pandas as pd
import numpy as np
def build_person(self):
    n = 10
    self.person_id.series = pd.Series((i for i in range (n)))
    self.gender_concept_id.series = pd.Series(np.random.choice([8507,8532],size=n))
    self.birth_datetime.series = pd.Series(np.random.choice(['1970-01-01','1990-01-01'],size=n))

person.define = build_person
person.get_df(force_rebuild=True)
2022-06-17 14:47:43 - Person - INFO - Automatically formatting data columns.
2022-06-17 14:47:43 - Person - INFO - created df (0x10e2485e0)[0x10e19ee50]

person_id gender_concept_id year_of_birth month_of_birth day_of_birth birth_datetime race_concept_id ethnicity_concept_id location_id provider_id care_site_id person_source_value gender_source_value gender_source_concept_id race_source_value race_source_concept_id ethnicity_source_value ethnicity_source_concept_id
0 0 8532 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1 8507 1990 1 1 1990-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2 8507 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 3 8507 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 4 8532 1990 1 1 1990-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 5 8532 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 6 8507 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 7 8507 1990 1 1 1990-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 8 8532 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 9 8507 1970 1 1 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

What do you mean by I/O ?

Various helper classes for data collections that control the Input/Output

from carrot.io import LocalDataCollection, SqlDataCollection, BCLinkDataCollection
LocalDataCollection, SqlDataCollection, BCLinkDataCollection
(carrot.io.plugins.local.LocalDataCollection,
 carrot.io.plugins.sql.SqlDataCollection,
 carrot.io.plugins.bclink.BCLinkDataCollection)

A LocalDataCollection can be used to load local csv files

local = LocalDataCollection({'Demographics.csv':'../data/part1/Demographics.csv'},nrows=10,chunksize=5)
local
2022-06-17 14:47:43 - LocalDataCollection - INFO - DataCollection Object Created
2022-06-17 14:47:43 - LocalDataCollection - INFO - Using a chunksize of '5' nrows
2022-06-17 14:47:43 - LocalDataCollection - INFO - Registering  Demographics.csv [<carrot.io.common.DataBrick object at 0x10a500370>]

<carrot.io.plugins.local.LocalDataCollection at 0x10a500b80>
local['Demographics.csv']
2022-06-17 14:47:43 - LocalDataCollection - INFO - Retrieving initial dataframe for 'Demographics.csv' for the first time

ID Age Sex
0 pk1 57.0 Male
1 pk2 68.0 Female
2 pk3 78.0 Female
3 pk4 51.0 Female
4 pk5 51.0 Male
local.next()
2022-06-17 14:47:43 - LocalDataCollection - INFO - Getting next chunk of data
2022-06-17 14:47:44 - LocalDataCollection - INFO - Getting the next chunk of size '5' for 'Demographics.csv'
2022-06-17 14:47:44 - LocalDataCollection - INFO - --> Got 5 rows

local['Demographics.csv']
ID Age Sex
5 pk6 64.0 Male
6 pk7 76.0 Female
7 pk8 60.0 Male
8 pk9 92.0 Female
9 pk10 58.0 Male
local.reset()
2022-06-17 14:47:44 - LocalDataCollection - INFO - resetting used bricks

A BCLinkDataCollection is used to interact with BCLink (for either I/O)

bclink = BCLinkDataCollection({'dry_run':True,'tables':{'person':'ds1000','observation':'ds10002'}},
                              output_folder='cache')
2022-06-17 14:47:44 - BCLinkDataCollection - INFO - setup bclink collection
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'ds1000' ) bclink
2022-06-17 14:47:44 - BCLinkHelpers - INFO - ds1000 (person) already exists --> all good
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'ds10002' ) bclink
2022-06-17 14:47:44 - BCLinkHelpers - INFO - ds10002 (observation) already exists --> all good
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds1000 bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds10002 bclink
2022-06-17 14:47:44 - BCLinkDataCollection - INFO - DataCollection Object Created

bclink.bclink_helpers.get_table_map()
{'person': 'ds1000', 'observation': 'ds10002'}
bclink.bclink_helpers.check_table_exists('person')
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'person' ) bclink

True

Example, create an indexing map by retrieving the last index of the table currently in BCLink:

bclink.load_indexing()
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds1000 bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT column_name FROM INFORMATION_SCHEMA. COLUMNS WHERE table_name = 'ds1000' LIMIT 1  bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT person_id FROM ds1000 ORDER BY -person_id LIMIT 1;  bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds10002 bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT column_name FROM INFORMATION_SCHEMA. COLUMNS WHERE table_name = 'ds10002' LIMIT 1  bclink
2022-06-17 14:47:44 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT person_id FROM ds10002 ORDER BY -person_id LIMIT 1;  bclink

{}

What are the tools?

Lots of different features, mostly helper functions used throughout the code

For example, loading a json rules file:

from IPython.display import SVG, display
from carrot.tools import load_json,make_dag

rules = load_json('../data/rules.json')

def show_svg():
    return display(SVG(make_dag(rules['cdm'])))
show_svg()
cluster_1 Source cluster_0 Common Data Model person_birth_datetime birth_datetime Demographics.csv_Age Age person_birth_datetime->Demographics.csv_Age Demographics.csv Demographics.csv Demographics.csv_Age->Demographics.csv person_gender_concept_id gender_concept_id Demographics.csv_Sex Sex person_gender_concept_id->Demographics.csv_Sex Demographics.csv_Sex->Demographics.csv person_gender_source_concept_id gender_source_concept_id person_gender_source_concept_id->Demographics.csv_Sex person_gender_source_value gender_source_value person_gender_source_value->Demographics.csv_Sex person_person_id person_id Demographics.csv_ID ID person_person_id->Demographics.csv_ID Demographics.csv_ID->Demographics.csv observation_observation_concept_id observation_concept_id Serology.csv_IgG IgG observation_observation_concept_id->Serology.csv_IgG Hospital_Visit.csv_reason reason observation_observation_concept_id->Hospital_Visit.csv_reason Serology.csv Serology.csv Serology.csv_IgG->Serology.csv observation_observation_datetime observation_datetime Serology.csv_Date Date observation_observation_datetime->Serology.csv_Date Hospital_Visit.csv_admission_date admission_date observation_observation_datetime->Hospital_Visit.csv_admission_date Serology.csv_Date->Serology.csv observation_observation_source_concept_id observation_source_concept_id observation_observation_source_concept_id->Serology.csv_IgG observation_observation_source_concept_id->Hospital_Visit.csv_reason observation_observation_source_value observation_source_value observation_observation_source_value->Serology.csv_IgG observation_observation_source_value->Hospital_Visit.csv_reason observation_person_id person_id Serology.csv_ID ID observation_person_id->Serology.csv_ID Hospital_Visit.csv_ID ID observation_person_id->Hospital_Visit.csv_ID Serology.csv_ID->Serology.csv Hospital_Visit.csv Hospital_Visit.csv Hospital_Visit.csv_reason->Hospital_Visit.csv Hospital_Visit.csv_admission_date->Hospital_Visit.csv Hospital_Visit.csv_ID->Hospital_Visit.csv condition_occurrence_condition_concept_id condition_concept_id condition_occurrence_condition_concept_id->Hospital_Visit.csv_reason Symptoms.csv_Headache Headache condition_occurrence_condition_concept_id->Symptoms.csv_Headache Symptoms.csv_Fatigue Fatigue condition_occurrence_condition_concept_id->Symptoms.csv_Fatigue Symptoms.csv_Dizzy Dizzy condition_occurrence_condition_concept_id->Symptoms.csv_Dizzy Symptoms.csv_Cough Cough condition_occurrence_condition_concept_id->Symptoms.csv_Cough Symptoms.csv_Fever Fever condition_occurrence_condition_concept_id->Symptoms.csv_Fever Symptoms.csv_Muscle_Pain Muscle_Pain condition_occurrence_condition_concept_id->Symptoms.csv_Muscle_Pain GP_Records.csv_comorbidity comorbidity condition_occurrence_condition_concept_id->GP_Records.csv_comorbidity Symptoms.csv Symptoms.csv Symptoms.csv_Headache->Symptoms.csv condition_occurrence_condition_end_datetime condition_end_datetime condition_occurrence_condition_end_datetime->Hospital_Visit.csv_admission_date Symptoms.csv_date_occurrence date_occurrence condition_occurrence_condition_end_datetime->Symptoms.csv_date_occurrence GP_Records.csv_date_of_visit date_of_visit condition_occurrence_condition_end_datetime->GP_Records.csv_date_of_visit Symptoms.csv_date_occurrence->Symptoms.csv condition_occurrence_condition_source_concept_id condition_source_concept_id condition_occurrence_condition_source_concept_id->Hospital_Visit.csv_reason condition_occurrence_condition_source_concept_id->Symptoms.csv_Headache condition_occurrence_condition_source_concept_id->Symptoms.csv_Fatigue condition_occurrence_condition_source_concept_id->Symptoms.csv_Dizzy condition_occurrence_condition_source_concept_id->Symptoms.csv_Cough condition_occurrence_condition_source_concept_id->Symptoms.csv_Fever condition_occurrence_condition_source_concept_id->Symptoms.csv_Muscle_Pain condition_occurrence_condition_source_concept_id->GP_Records.csv_comorbidity condition_occurrence_condition_source_value condition_source_value condition_occurrence_condition_source_value->Hospital_Visit.csv_reason condition_occurrence_condition_source_value->Symptoms.csv_Headache condition_occurrence_condition_source_value->Symptoms.csv_Fatigue condition_occurrence_condition_source_value->Symptoms.csv_Dizzy condition_occurrence_condition_source_value->Symptoms.csv_Cough condition_occurrence_condition_source_value->Symptoms.csv_Fever condition_occurrence_condition_source_value->Symptoms.csv_Muscle_Pain condition_occurrence_condition_source_value->GP_Records.csv_comorbidity condition_occurrence_condition_start_datetime condition_start_datetime condition_occurrence_condition_start_datetime->Hospital_Visit.csv_admission_date condition_occurrence_condition_start_datetime->Symptoms.csv_date_occurrence condition_occurrence_condition_start_datetime->GP_Records.csv_date_of_visit condition_occurrence_person_id person_id condition_occurrence_person_id->Hospital_Visit.csv_ID Symptoms.csv_ID ID condition_occurrence_person_id->Symptoms.csv_ID GP_Records.csv_ID ID condition_occurrence_person_id->GP_Records.csv_ID Symptoms.csv_ID->Symptoms.csv Symptoms.csv_Fatigue->Symptoms.csv Symptoms.csv_Dizzy->Symptoms.csv Symptoms.csv_Cough->Symptoms.csv Symptoms.csv_Fever->Symptoms.csv Symptoms.csv_Muscle_Pain->Symptoms.csv GP_Records.csv GP_Records.csv GP_Records.csv_comorbidity->GP_Records.csv GP_Records.csv_date_of_visit->GP_Records.csv GP_Records.csv_ID->GP_Records.csv drug_exposure_drug_concept_id drug_concept_id Vaccinations.csv_type type drug_exposure_drug_concept_id->Vaccinations.csv_type Vaccinations.csv Vaccinations.csv Vaccinations.csv_type->Vaccinations.csv drug_exposure_drug_exposure_end_datetime drug_exposure_end_datetime Vaccinations.csv_date_of_vaccination date_of_vaccination drug_exposure_drug_exposure_end_datetime->Vaccinations.csv_date_of_vaccination Vaccinations.csv_date_of_vaccination->Vaccinations.csv drug_exposure_drug_exposure_start_datetime drug_exposure_start_datetime drug_exposure_drug_exposure_start_datetime->Vaccinations.csv_date_of_vaccination drug_exposure_drug_source_concept_id drug_source_concept_id drug_exposure_drug_source_concept_id->Vaccinations.csv_type drug_exposure_drug_source_value drug_source_value drug_exposure_drug_source_value->Vaccinations.csv_type drug_exposure_person_id person_id Vaccinations.csv_ID ID drug_exposure_person_id->Vaccinations.csv_ID Vaccinations.csv_ID->Vaccinations.csv person person person->person_birth_datetime person->person_gender_concept_id person->person_gender_source_concept_id person->person_gender_source_value person->person_person_id observation observation observation->observation_observation_concept_id observation->observation_observation_datetime observation->observation_observation_source_concept_id observation->observation_observation_source_value observation->observation_person_id condition_occurrence condition_occurrence condition_occurrence->condition_occurrence_condition_concept_id condition_occurrence->condition_occurrence_condition_end_datetime condition_occurrence->condition_occurrence_condition_source_concept_id condition_occurrence->condition_occurrence_condition_source_value condition_occurrence->condition_occurrence_condition_start_datetime condition_occurrence->condition_occurrence_person_id drug_exposure drug_exposure drug_exposure->drug_exposure_drug_concept_id drug_exposure->drug_exposure_drug_exposure_end_datetime drug_exposure->drug_exposure_drug_exposure_start_datetime drug_exposure->drug_exposure_drug_source_concept_id drug_exposure->drug_exposure_drug_source_value drug_exposure->drug_exposure_person_id
from carrot.tools import remove_missing_sources_from_rules
filtered_rules = remove_missing_sources_from_rules(rules,local.keys())
filtered_rules
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Antibody 3027 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed H/O: heart failure 3043 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed 2019-nCoV 3044 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Cancer 3045 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed cdm table 'observation' from rules
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Headache 3028 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Fatigue 3029 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Dizziness 3030 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Cough 3031 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Fever 3032 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Muscle pain 3033 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Pneumonia 3042 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Mental health problem 3046 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Mental disorder 3047 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Type 2 diabetes mellitus 3048 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Ischemic heart disease 3049 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed Hypertensive disorder 3050 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed cdm table 'condition_occurrence' from rules
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed COVID-19 vaccine 3034 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed COVID-19 vaccine 3035 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed COVID-19 vaccine 3036 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed SARS-CoV-2 (COVID-19) vaccine, mRNA-1273 0.2 MG/ML Injectable Suspension 3040 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed SARS-CoV-2 (COVID-19) vaccine, mRNA-BNT162b2 0.1 MG/ML Injectable Suspension 3041 from rules because it was not loaded
2022-06-17 14:47:45 - remove_missing_sources_from_rules - WARNING - removed cdm table 'drug_exposure' from rules

{'metadata': {'date_created': '2022-02-12T12:22:48.465257',
  'dataset': 'FAILED: ExampleV4'},
 'cdm': {'person': {'MALE 3025': {'birth_datetime': {'source_table': 'Demographics.csv',
     'source_field': 'Age',
     'operations': ['get_datetime_from_age']},
    'gender_concept_id': {'source_table': 'Demographics.csv',
     'source_field': 'Sex',
     'term_mapping': {'Male': 8507}},
    'gender_source_concept_id': {'source_table': 'Demographics.csv',
     'source_field': 'Sex',
     'term_mapping': {'Male': 8507}},
    'gender_source_value': {'source_table': 'Demographics.csv',
     'source_field': 'Sex'},
    'person_id': {'source_table': 'Demographics.csv', 'source_field': 'ID'}},
   'FEMALE 3026': {'birth_datetime': {'source_table': 'Demographics.csv',
     'source_field': 'Age',
     'operations': ['get_datetime_from_age']},
    'gender_concept_id': {'source_table': 'Demographics.csv',
     'source_field': 'Sex',
     'term_mapping': {'Female': 8532}},
    'gender_source_concept_id': {'source_table': 'Demographics.csv',
     'source_field': 'Sex',
     'term_mapping': {'Female': 8532}},
    'gender_source_value': {'source_table': 'Demographics.csv',
     'source_field': 'Sex'},
    'person_id': {'source_table': 'Demographics.csv', 'source_field': 'ID'}}}}}

What is the CommonDataModel ?

The python class that controls everything when building a common data model

from carrot.cdm import CommonDataModel
CommonDataModel
carrot.cdm.model.CommonDataModel
cdm = CommonDataModel.from_rules(filtered_rules,inputs=local,outputs=bclink)
2022-06-17 14:47:45 - CommonDataModel - INFO - CommonDataModel (5.3.1) created with co-connect-tools version 0.0.0
2022-06-17 14:47:45 - CommonDataModel - INFO - Running with an DataCollection object
2022-06-17 14:47:45 - CommonDataModel - INFO - Turning on automatic cdm column filling
2022-06-17 14:47:45 - BCLinkHelpers - WARNING - No table for getting existing person ids (person_ids) has been defined
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds1000 bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT column_name FROM INFORMATION_SCHEMA. COLUMNS WHERE table_name = 'ds1000' LIMIT 1  bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT person_id FROM ds1000 ORDER BY -person_id LIMIT 1;  bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT count(*) FROM ds10002 bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT column_name FROM INFORMATION_SCHEMA. COLUMNS WHERE table_name = 'ds10002' LIMIT 1  bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - bc_sqlselect --user=bclink --query=SELECT person_id FROM ds10002 ORDER BY -person_id LIMIT 1;  bclink
2022-06-17 14:47:45 - CommonDataModel - INFO - Added MALE 3025 of type person
2022-06-17 14:47:45 - CommonDataModel - INFO - Added FEMALE 3026 of type person

cdm.process()
2022-06-17 14:47:45 - CommonDataModel - INFO - Starting processing in order: ['person']
2022-06-17 14:47:45 - CommonDataModel - INFO - Number of objects to process for each table...
{
      "person": 2
}
2022-06-17 14:47:45 - CommonDataModel - INFO - for person: found 2 objects
2022-06-17 14:47:45 - CommonDataModel - INFO - working on person
2022-06-17 14:47:45 - CommonDataModel - INFO - starting on MALE 3025
2022-06-17 14:47:45 - Person - INFO - Called apply_rules
2022-06-17 14:47:45 - LocalDataCollection - INFO - Retrieving initial dataframe for 'Demographics.csv' for the first time
2022-06-17 14:47:45 - Person - INFO - Mapped birth_datetime
2022-06-17 14:47:45 - Person - INFO - Mapped gender_concept_id
2022-06-17 14:47:45 - Person - INFO - Mapped gender_source_concept_id
2022-06-17 14:47:45 - Person - INFO - Mapped gender_source_value
2022-06-17 14:47:45 - Person - INFO - Mapped person_id
2022-06-17 14:47:45 - Person - WARNING - Requiring non-null values in gender_concept_id removed 3 rows, leaving 2 rows.
2022-06-17 14:47:45 - Person - INFO - Automatically formatting data columns.
2022-06-17 14:47:45 - Person - INFO - created df (0x10e65aee0)[MALE_3025]
2022-06-17 14:47:45 - CommonDataModel - INFO - finished MALE 3025 (0x10e65aee0) ... 1/2 completed, 2 rows
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - saving person_ids to cache/person_ids.csv
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:45 - BCLinkHelpers - ERROR - table person_ids unknown in dict_keys(['person', 'observation'])
2022-06-17 14:47:45 - CommonDataModel - INFO - starting on FEMALE 3026
2022-06-17 14:47:45 - Person - INFO - Called apply_rules
2022-06-17 14:47:45 - Person - INFO - Mapped birth_datetime
2022-06-17 14:47:45 - Person - INFO - Mapped gender_concept_id
2022-06-17 14:47:45 - Person - INFO - Mapped gender_source_concept_id
2022-06-17 14:47:45 - Person - INFO - Mapped gender_source_value
2022-06-17 14:47:45 - Person - INFO - Mapped person_id
2022-06-17 14:47:45 - Person - WARNING - Requiring non-null values in gender_concept_id removed 2 rows, leaving 3 rows.
2022-06-17 14:47:45 - Person - INFO - Automatically formatting data columns.
2022-06-17 14:47:45 - Person - INFO - created df (0x10e65ef70)[FEMALE_3026]
2022-06-17 14:47:45 - CommonDataModel - INFO - finished FEMALE 3026 (0x10e65ef70) ... 2/2 completed, 3 rows
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - updating person_ids in cache/person_ids.csv
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:45 - BCLinkHelpers - ERROR - table person_ids unknown in dict_keys(['person', 'observation'])
2022-06-17 14:47:45 - CommonDataModel - INFO - saving dataframe (0x10e650d30) to <carrot.io.plugins.bclink.BCLinkDataCollection object at 0x10e2995e0>
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - saving person to cache/person.csv
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - dataset_tool --load --table=ds1000 --user=data --data_file=cache/person.csv --support --bcqueue bclink
2022-06-17 14:47:45 - BCLinkHelpers - NOTICE - datasettool2 list-updates --dataset=ds1000 --user=data --database=bclink
2022-06-17 14:47:45 - CommonDataModel - INFO - finalised person on iteration 0 producing 5 rows from 2 tables
2022-06-17 14:47:45 - LocalDataCollection - INFO - Getting next chunk of data
2022-06-17 14:47:45 - LocalDataCollection - INFO - Getting the next chunk of size '5' for 'Demographics.csv'
2022-06-17 14:47:45 - LocalDataCollection - INFO - --> Got 5 rows
2022-06-17 14:47:45 - CommonDataModel - INFO - for person: found 2 objects
2022-06-17 14:47:45 - CommonDataModel - INFO - working on person
2022-06-17 14:47:45 - CommonDataModel - INFO - starting on MALE 3025
2022-06-17 14:47:45 - CommonDataModel - INFO - finished MALE 3025 (0x10e65aee0) ... 1/2 completed, 2 rows
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - updating person_ids in cache/person_ids.csv
2022-06-17 14:47:45 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:45 - BCLinkHelpers - ERROR - table person_ids unknown in dict_keys(['person', 'observation'])
2022-06-17 14:47:45 - CommonDataModel - INFO - starting on FEMALE 3026
2022-06-17 14:47:46 - CommonDataModel - INFO - finished FEMALE 3026 (0x10e65ef70) ... 2/2 completed, 3 rows
2022-06-17 14:47:46 - BCLinkDataCollection - INFO - updating person_ids in cache/person_ids.csv
2022-06-17 14:47:46 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:46 - BCLinkHelpers - ERROR - table person_ids unknown in dict_keys(['person', 'observation'])
2022-06-17 14:47:46 - CommonDataModel - INFO - saving dataframe (0x10e653100) to <carrot.io.plugins.bclink.BCLinkDataCollection object at 0x10e2995e0>
2022-06-17 14:47:46 - BCLinkDataCollection - INFO - updating person in cache/person.csv
2022-06-17 14:47:46 - BCLinkDataCollection - INFO - finished save to file
2022-06-17 14:47:46 - BCLinkHelpers - NOTICE - dataset_tool --load --table=ds1000 --user=data --data_file=cache/person.csv --support --bcqueue bclink
2022-06-17 14:47:46 - BCLinkHelpers - NOTICE - datasettool2 list-updates --dataset=ds1000 --user=data --database=bclink
2022-06-17 14:47:46 - CommonDataModel - INFO - finalised person on iteration 1 producing 5 rows from 2 tables
2022-06-17 14:47:46 - LocalDataCollection - INFO - Getting next chunk of data
2022-06-17 14:47:46 - LocalDataCollection - INFO - Getting the next chunk of size '5' for 'Demographics.csv'
2022-06-17 14:47:46 - LocalDataCollection - INFO - --> Got 0 rows
2022-06-17 14:47:46 - LocalDataCollection - INFO - All input files for this object have now been used.

cdm['person'].dropna(axis=1)
gender_concept_id year_of_birth month_of_birth day_of_birth birth_datetime gender_source_value gender_source_concept_id
person_id
6 8507 1963 7 16 1963-07-16 00:00:00.000000 Male 8507
7 8507 1969 7 14 1969-07-14 00:00:00.000000 Male 8507
8 8532 1952 7 18 1952-07-18 00:00:00.000000 Female 8532
9 8532 1942 7 21 1942-07-21 00:00:00.000000 Female 8532
10 8532 1969 7 14 1969-07-14 00:00:00.000000 Female 8532