2.3.2.1 Bai

Duk yadda "babban" your "babban data" shi yiwuwa ba da bayanin da kake so.

Mai babban data kafofin ne bai cika ba, a ji cewa ba su da bayani da za ka so for your bincike. Wannan shi ne na kowa alama na bayanai da aka halitta dalilai wanin bincike. Mutane da yawa zamantakewa masana kimiyya sun riga yana da kwarewa na tafiyad da incompleteness, kamar wani data kasance binciken cewa ba tambayar ka so. Abin baƙin ciki, matsalolin incompleteness ayan zama mafi matsananci a babban data. A na kwarewa, babban data o ƙarin tabbatar da za a rasa iri uku bayanai da amfani ga zamantakewa da bincike: demographics, hali a kan sauran dandamali, kuma data to operationalize msar tambayar gina.

All uku daga cikin wadannan siffofin incompleteness an kwatanta a cikin wani binciken da Gueorgi Kossinets da Duncan Watts (2006) a game da ci gaba na zamantakewa na cibiyar sadarwa, a wata jami'a. Kossinets da Watts fara da email rajistan ayyukan daga jami'a, wanda ya daidai da bayani game da wanda ya aiko imel zuwa wanda a wani lokaci (da masu bincike ba su da damar yin amfani da abun ciki na imel). Wadannan email records sauti kamar ban mamaki dataset, amma, suna-duk da size da granularity-fundamentally bai cika. Alal misali, email rajistan ayyukan ba sun hada da bayanai game da alƙaluma halaye na dalibai, kamar jinsi da shekaru. Bugu da ari, email rajistan ayyukan ba sun hada da bayanai game da sadarwa ta hanyar kafofin watsa labarai da sauran, irin su wayar da kira, saƙon rubutu, ko kuma su fuskanci-to-face tattaunawa. A karshe, cikin email rajistan ayyukan ba kai tsaye sun hada da bayanai game da dangantaka, da msar tambayar gina da yawa a cikin data kasance theories. Daga baya a cikin babi na, a lõkacin da Na yi magana game da bincike dabarun, za ku ji ga yadda Kossinets da Watts warware wadannan matsaloli.

Daga iri uku incompleteness, matsalar bai cika bayanai zuwa operationalize msar tambayar gina shi ne mafi wuya a warware, kuma a cikin ta kwarewa, shi ne sau da yawa da gangan saba shukawa da data masana kimiyya. Wajen, msar tambayar gina su ne m ideas cewa zamantakewa masana kimiyya karatu, amma, da rashin alheri, wadannan gina ba zai iya ko da yaushe a ainihin tsare kuma Ya ƙaddara. Alal misali, bari mu kwatanta kokarin empirically gwada a fili m da'awar cewa mutanen da suke da mafi fasaha aikatãwa karin kudi. Domin gwada wannan da'awar za ka bukatar ka auna "m." Amma, abin da yake m? Alal misali, Gardner (2011) bayar da hujjar cewa, akwai zahiri takwas daban-daban siffofin hankali. Kuma, akwai hanyoyin da zai iya daidai auna wani daga cikin wadannan siffofin m? Duk da babban yawa na aikin da masana ilimin tunani, wadannan tambayoyi har yanzu ba su da unambiguous amsoshi. Saboda haka, ko da wani gwada m da'awar-mutanen da suke more hankali aikatãwa karin kudi-iya zama da wuya a tantance empirically, domin zai iya zama da wuya a operationalize msar tambayar gina a data. Sauran misalai na msar tambayar gina da suke da muhimmanci, amma da wuya a operationalize hada da "norms," ​​"social babban birnin kasar," da kuma "mulkin demokra] iyya." Social masana kimiyya kira wasa tsakanin msar tambayar gina da kuma bayanan gina tushe (Cronbach and Meehl 1955) . Kuma, kamar yadda wannan jerin gina shawara, yi tushe ne matsalar da zamantakewa masana kimiyya suka yi jihãdi da ga mai dogon lokaci, har ma a lokacin da suke aiki tare da bayanai da aka tattara domin manufar gudanar da bincike. Lokacin yin aiki tare da bayanan da aka tattara domin dalilai wanin bincike, matsalolin gina tushe su ne ma fi kalubale (Lazer 2015) .

A lokacin da kake karanta wani bincike takarda, daya sauri da kuma amfani hanyar tantance damuwa game da gina tushe ne ya dauki babban da'awar a cikin takarda, wanda mafi yawa ana bayyana cikin sharuddan gina, da kuma sake bayyana shi cikin sharuddan da data kasance. Alal misali, ka yi la'akari biyu tamkar karatu cewa da'awar nuna cewa mafi fasaha mutane aikatãwa karin kudi:

  • Nazarin 1: mutanen da suka score da kyau a kan Raven Progressive matrices Test-a da karatu gwajin nazari m (Carpenter, Just, and Shell 1990) -have mafi girma ruwaito albashi a kan su haraji dawo
  • Nazarin 2: mutane a kan Twitter suka yi amfani da ya fi tsayi kalmomi ne mafi kusantar su ambaci alatu brands

A lokuta biyu, masu bincike zai iya tabbatar da cewa sun nuna cewa mafi fasaha mutane aikatãwa karin kudi. Amma, a cikin ta farko binciken da msar tambayar gina suna da kyau operationalized da data, kuma a karo na biyu ba su. Bugu da ari, kamar yadda wannan misali ya nuna, mafi data ba ta atomatik warware matsaloli tare da gina tushe. Ya kamata ka shakka sakamakon Nazarin 2 ko yana da hannu a miliyan tweets, a biliyan tweets, ko tiriliyan tweets. Domin masu bincike ba su saba da ra'ayin gina tushe, Table 2.2 bayar da wasu misalai na karatu da suka operationalized msar tambayar gina ta yin amfani da digital alama data.

Table 2.2: Misalan digital burbushi wanda aka yi amfani da matakan da more m msar tambayar Concepts. Social masana kimiyya kira wannan wasan gina tushe, kuma shi ne mai babbar kalubale tare da yin amfani da babban data kafofin ga zamantakewa bincike (Lazer 2015) .
digital alama Irfanin gina lissafi
email rajistan ayyukan daga wata jami'a (meta-data kawai) Social dangantaka Kossinets and Watts (2006) , Kossinets and Watts (2009) , De Choudhury et al. (2010)
kafofin watsa labarun posts on Weibo Civic alkawari Zhang (2016)
email rajistan ayyukan daga m (meta-bayanai da kuma cikakken rubutu) Al'adu fit a cikin wani shiri Goldberg et al. (2015)

Ko da yake matsalar bai cika data for operationalizing msar tambayar gina shi ne kyawawan wuya a warware, akwai uku kowa mafita ga matsalar bai cika alƙaluma bayanai da kuma bai cika bayani a kan hali a kan sauran dandamali. Na farko shi ne ya zahiri tattara bayanai da ke bukatar. Zan gaya muku game da wani misali na cewa a Babi na 3, lokacin da na gaya muku game da safiyo. Abin baƙin ciki, irin wannan labari tarin ne ba ko da yaushe zai yiwu. Na biyu main bayani ne ka yi abin da masana kimiyya data kira mai amfani-sifa hasashe da abin da zamantakewa masana kimiyya kira imputation. A cikin wannan m, masu bincike amfani da bayanai da suka yi a kan wasu mutane su infer halayen sauran mutane. Na uku m bayani-daya amfani da Kossinets da Watts-ya hada mahara data kafofin. Wannan tsari ne da ake kira, wani lokacin tattara abubuwa masu kyau ko rikodin hada huldodi. My fi so misãli ga wannan tsari da aka samarwa a cikin sosai farko sakin layi na farko cikin takarda kullum rubuta a kan rikodin hada huldodi (Dunn 1946) :

"Kowane mutum a duniya Halicci littafin Life. Wannan littafin ya fara da haihuwa da kuma ƙare da mutuwa. Its pages suna yi sama da records da manufa faru a rayuwa. Record hada huldodi shi ne sunan da aka ba wa tsari na tattaro shafukan da wannan littafi a cikin wani girma. "

Wannan nassi da aka rubuta a shekarar 1946, kuma a wancan lokaci, mutane suna tunanin cewa littafin Life iya hada manyan rai al'amuran kamar haihuwa, aure, saki, da kuma mutuwa. Duk da haka, yanzu da haka da yawa bayanai game da mutanen da aka rubuta, littafin Life zai iya zama mai wuce yarda da cikakken hoto, dã waɗanda daban-daban pages (ie, mu digital burbushi), za a iya daure tare. Wannan littafin Life zai iya zama mai girma hanya don bincike. Amma, littafin Life za a iya kira a database halakã (Ohm 2010) , wanda za a iya amfani da kowane irin unethical dalilai, kamar yadda aka bayyana more kasa a lõkacin da Na yi magana game da m yanayin da bayanai da aka tattara daga babban data kafofin kasa kuma a Babi na 6 (Ethics).