LFS2010

LFS Post data processing:


 


 

Data base from the first scan is backed up two places – at email address of mychitterman@gmail.com

Also in the //statssrv/spiceislfo/i/lfs – The backup follows the process – with 3 folders.


 

The data was then analyzed for the main table (household data) looking for duplicates, missing values and errors with Parish, ED and Household, Link ID.


 

We manually went through each questionnaire, paper base; we placed a check on the questionnaires into the database with a field "Validated".


 

We then realize that there was a duplicate, bad LINK ID – in one case, 25 questionnaires had the LINK ID 1222.

We had to print new questionnaires and re-write the values of the old questionnaires.


 

We then also identified duplicates – and removed them manually if the LINK ID and the paper questionnaire were used to validate it.


 

At the end of the viewing of the main table we got 918 records.


 

At this point the database was also backed up – using the above methods.


 

MOVING ON:

It was advised that we needed to put all individuals data into one table – presently the database has about 27 tables. The main table is named imi the other tables are named sec and their page numbers according to the


 


 


Details on the advice are as follows.

Create a query of each table – create an index field eg. KEY and use this key field in each table to build a relationship. This relationship will be mapping each record from on table to it corresponding record of another table.


 

CHALLENGES with KEY FIELD and individual data


 

The first table sec was realized to have more than 7344 records. The value of 7344 was expected because we had 918 households, each having 8 person no. 918 x 8 = 7344

It should be expected that each individual table need to reflect the 7344.

Some tables had more than what was expected and some had less.


 

The processes used below are what was taken to inspect each table and rectified the problem.


 

8 persons on each questionnaire mean that we should have 918 x 8 = 7344 in every table for persons.


 

Taking table 7 sec3_pg4 as the example – we create a query of this table and create a KEY FIELD.

This is the KEY field.


 

SELECT DISTINCT sec3_pg7.*, [LINK] & [pno7] AS [KEY]

FROM sec3_pg7;


 

NOTE: KEY = LINK & PersonNo.


 

We did not get the expected 7344 but 7314 person records; this alerts us that some data is missing.


 


 

Method of troubleshooting


 

  1. Copy the data to excel, this will be a copy of the query of the table sec3_pg4, I named this query appropriately eg. Section3 Page 4.
  2. This query is better than copying the table itself, since the duplicates of records are not present.
  3. When we paste this table in excel.
  4. Run a pivot table to see the data for the LINK ID counts. ( So this is LINKID, count of Link ID)
  5. It should show that all links should occur 8 times. Eg. 21—8
  6. So every field that shows more than 8 or less than 8 has a problem.


 

We then go to the database and clean these values. We do manual inputting of values where they are missing.

We check for duplicates by reassessing the paper questionnaire, if the link ID is in deep correct and the paper questionnaire is matching the data, we choose to remove one of the duplicates – again duplicates of data may have been cause because of disjointed communications with the scanning process. It was notes that some questionnaires came in batches – which was reported as being completed, they were scanned, then a few week after 1 or 2 more questionnaires may have been added, the scanner managers not know which was which, scanned over these questionnaires.


 


 

TABLE 1.0 This is a list of LINK ID that occurred in this sec3_page7.

1119    5

1133    1

1307    9

1331    9

1799    9

1805    9

1887    1

2003    4

2047    1

2131    1

579    2

669    1


 

These will be occurring in most, if not all of the tables. The same LINK ID are expected to have a problem


 

EXPLAINATION OF ISSUE

The list of value above displays some values. The example of 1119 displays a 5, this means that 3 other person records are missing from the table.

When one will look into the database they would see this:


 

Figure 5.0



 


 

This questionnaire is now found from its batch.

We trace this questionnaire back in the main household table; in this case we go to table "imi"

We find the record for this questionnaire LINK ID and the ED and Parish Number.

We quickly pull this batch, find the questionnaire, and we hold onto it for it will be used in all other table for verification of the data.

Figure 6.0



 

We proceed to fix this record in Figure 5.0 by using the evidence from the paper questionnaire. In this we look to get the personNo matching the response for the pages shown. And we input the 01, 02, 03, 04, 05, and we also input 06, 07, 08 for the empty values with matching LINKID of 1119


 

In the database we search for the LINK ID eg: 1119 - we see the problem is with the personNo. So we have to manually put this into the database

  • we put 01
  • then 02 not 2 or 1 or 3
  • We put 01 etc.
  • we ensure that we have 8 of these values with the link ID

We repeater the entire process for all the records shown in Table 1.0

STEP 3

We go to our query and we now follow the query pattern as detailed for the tables _MainJohnPage 1

Take a copy of the last mainJoin and develop it to reflect this new table.

This query should return the expected value 7344.


 

STEP 4

Now we can place this data into the MAINJOIN query, MainJoin is the query for the entire record set for the individuals; it should always should 7344 for the new sheet.

It is the query that will be a compilation of the 27 or so tables.


 

STEP 5:

Step 5 will be developed on advice of Mr. Edwin St. Catharine.


 

MISSING DATA – Export problems.

I have seen some tables where there are "missing data". Table 9 has fewer than expected records even when we use the described process. The data was short. It means that data was missing from this table, maybe it was not exported to the database during the export process.

Having the knowledge of all the LINK ID that are to be expected – which one can take from the MAIN table ("imi") – a comparison was done against the LINK ID of this table 9 (Sec 2- pages 9). I copied both LINK ID from main table and LINK ID from Table 9 into excel. I created a named table, and performed a vlookup against the table. This produced a list of all Missing LINKS ID that should be in table 9.

FIXING MISSING DATA –manual input

To fix the data one will have to take the paper questionnaires and find the respective pages. Manually Input the data.

Comments