Skip to content

Sep 14

Working directory setup

In the directory:

/usr/lib/bcos/OMOP-test-data/calum_tests_14Sep

I have the tsv file created by the ETL-Tool that I want to test uploading person.tsv.

Clean the database

datasettool2 delete-all-rows ds100394 --database=bclink

CommandLine Options

dataset_tool

The help gives the following information:

dataset_tool --load --table=ds100123 --user=<test_user>
    --data_file=ds100123.txt [or --data_file_list=my_filelist.txt]
    [--extra='...'] [--no-backup] [--support]
    [--move-original-file] [--grab-original-file] [--immediate-write]
    [--force-prepare-data] [--job-parameters=<extra-job-params-file>]
    [--bcqueue] [--bcqueue-res-path='job-result-path'] [--niceness=10]
    [--device=XX] [--subj_filter=xx] [--samples_table=xx] database

Some additional information is also given:

Arguments for load mode:
  --table=ds100123
     Must contain the name of the destination table.
  --no-backup
     Don't do a input file backup. Product dataset data needs no backup.
  --support
     Uses supp-dataload-batch queue instead of dataload-batch, 
     or supp-dataupd2-batch queue instead of dataupd2-batch.
  --immediate-write
     Avoid job queue altogether: calls simpleupd directly.
  --user=test_user
     Must contain the SQL user, as whom the data will be inserted into the data set.
  --data_file=ds100123.txt
     Must contain the data file in textdb format, to be loaded into the database.
  --data_file_list=one_line_per_uploadfile.txt Upload multiple files with one submit.
  --extra=... Add extra parameters into multiple file upload (--data_file_list) supplied. 
  --update_type=direct or --update_type=incremental:      incremental mode skips row updates with missing alleles.
  --move-original-file
     Moves the original file in --data_file=<file> into /data/var/lib/bcos/tmp/ to be used on load operation.
  --grab-original-file
     Uses the file <file> in place, so it will not move the file for the following upload job.
  --subj_filter=[--no,--skip,--strict]
     '--no' means no samples filter,
     '--skip' means pass mapped rows thrue, others are discarded,
     '--strict' means if one unmappable row is found, don't do the upload.
  --action_mode=[load,insert,?] (please verify the meaning first if you use this).
  --device=UPLOAD_DEVICE
  --samples_table=samples means that skip / strict upload is done against this samples table.

datasettool2 load

datasettool2 load --dataset=<DATASET> [--datafile=<FILE> | --source-dataset=<SOURCE_DATASET>]
                  [--submission-id] [--queue] [--resfolder=PATH] [--force-prepare-data]
          [--timing] [--hierarchical-timing] [--sqltimeout=<SEC>]
          [--database=<DATABASE>] [--user=<USER>] [--developer=<DEVELOPER>]
          [--pooledconnection] [--job-id=<JOB>] [--job=<JOB>]

--resfolder=<PATH> : appears to have no effect, all the jobs end up in /data/var/lib/bcos/download/<user>, this would be useful if you could specify the job output path.

datasettool2 load2

datasettool2 load2 --dataset=<DATASET> --datafile=<FILE> [--force-prepare-data]
                  [--timing] [--hierarchical-timing] [--sqltimeout=<SEC>]
          [--database=<DATABASE>] [--user=<USER>] [--developer=<DEVELOPER>]
          [--pooledconnection] [--job-id=<JOB>] [--job=<JOB>]

Insert into the database

Using datasettool2 load2

[bcos_srv@link-test-dt calum_tests_14Sep]$ datasettool2 load2 --dataset=ds100394 --datafile=`pwd`/person.tsv --database=bclink --user=data 
Upload started: dataset=PERSON_V2 (ds100394), file=/usr/lib/bcos/OMOP-test-data/calum_tests_14Sep/person.tsv, id=9 , resultfolder=datasettool2_uploads/9
Upload done: dataset=PERSON_V2 (ds100394), file=/usr/lib/bcos/OMOP-test-data/calum_tests_14Sep/person.tsv, id=9 , resultfolder=datasettool2_uploads/9

These jobs get run on the BCLink queue, and the output results folder can be seen here:

By using --user=data I can see the job via the GUI too.

$ ls -ltr /data/var/lib/bcos/download/data/datasettool2_uploads/9
total 8
-rw-rw---- 1 bcos_srv bcos_srv 103 Sep 14 17:13 invalid_values.dat
-rw-rw---- 1 bcos_srv bcos_srv 755 Sep 14 17:13 cover.10126

Using datasettool2 load