2.4.3.2 matching

Matching halitta gaskiya kwatancen da pruning bãya lokuta.

Fair kwatancen iya zo daga ko dai yi da ka sarrafawa gwajen ko halitta gwaje-gwajen. Amma, akwai mutane da yawa yanayi inda ba za ka iya gudu da manufa gwaji da kuma yanayi bai bayar da wata halitta gwaji. A cikin wadannan saituna, hanya mafi kyau don ƙirƙirar m kwatanta matching. A daidai, da bincike ya dubi, ta hanyar da ba-gwaji data haifar da nau'i-nau'i daga mutane da cewa su ne m, fãce wanda wanda ya karbi magani da kuma wanda ya ba. A kan aiwatar da daidai, masu bincike suna zahiri ma pruning. wato, yin watsi da lokuta inda babu shakka kwatanta. Saboda haka, wannan hanya za a more daidai kira matching-da-pruning, amma zan tsaya tare da gargajiya lokaci: matching.

A beautiful misali na da ikon matching dabarun da m ba gwaji data kafofin zo daga bincike kan mabukaci hali da Liran Einav da kuma abokan aiki (2015) . Einav da kuma abokan aiki sha'awar auctions faruwa a eBay, da kuma a kwatanta aikinsu, zan mayar da hankali a kan daya musamman al'amari: sakamakon gwanjo fara price a gwanjo sakamakon, kamar sale price ko yiwuwar a sayarwa.

A mafi yawan butulci hanyar amsa tambaya game da sakamako na farawa price sayarwa price zai zama don kawai lissafi karshe price for auctions da daban-daban farawa farashin. Wannan m zai zama m idan ka kawai so hango ko hasashen sale farashin mai da aka ba abu da aka sa a kan eBay da aka farawa price. Amma, idan ka tambaya shi ne abin da shi ne sakamako na fara price a kasuwa sakamakon wannan m zai yi aiki ba, domin shi ba bisa gaskiya misãlai da auctions da ƙananan farawa farashin zai yi quite daban-daban daga auctions da hakan farawa farashin (misali, su zama daban-daban na dukiya ko sun hada da daban-daban na sayarwa).

Idan ka riga damuwa game da yin adalci kwatancen, za ka iya tsallake butulci m da kuma la'akari da gudu a filin gwajin inda za ka sayar da wani takamaiman abu-ce, a golf din-da wani ajali sa na gwanjo sigogi-ce, free shipping, gwanjo bude makonni biyu, da dai sauransu-amma da da ka saita fara farashin. By gwada sakamakon kasuwa sakamakon, wannan filin gwajin zai bayar a fili ji daga kufan fara price sayarwa price. Amma, wannan ji kawai tambaya daya musamman samfurin da kuma kafa na gwanjo sigogi. Sakamakon iya zama daban-daban, alal misali, domin daban-daban na kayayyakin. Ba tare da m ka'idar, yana da wuya a extrapolate daga wannan guda gwaji da cikakken kewayon yiwu gwaje-gwajen da zai iya yi da aka gudanar. Bugu da ari, filin gwajen ne isasshe m cewa zai zama infeasible gudu isa su har zuwa rufe dukan siga sarari na samfurori da kuma gwanjo iri.

Ya bambanta da butulci m da gwaji m, Einav da kuma abokan aiki sama na uku m: matching. Babban abin zamba da dabarun ne don gano abubuwa kama filin gwaje-gwajen da suka riga ya faru a kan eBay. Alal misali, Figure 2.6 nuna wasu daga cikin 31 da jerin for daidai wannan golf din-a Taylormade kuka 09 Driver-da ake sayar da daidai wannan seller- "budgetgolfer". Duk da haka, wadannan listings da dan kadan daban-daban. Goma sha daga gare su bayar da direba ga wani gyarawa farashin $ 124,99, yayin da sauran 20 ne auctions da daban-daban karshen kwanakin. Har ila yau, jerin da daban-daban shipping kudade, ko dai $ 7,99 ko $ 9.99. A takaice, shi ne kamar yadda idan "budgetgolfer" ne a guje gwaje-gwajen da masu bincike.

The listings na Taylormade kuka 09 Driver ake sayar da "budgetgolfer" daya ne misali da wani dace kafa listings, inda ainihin wannan abu da ake sayar da ainihin wannan sayarwa amma kowanne lokaci tare da dan kadan daban-daban. A cikin m rajistan ayyukan na eBay akwai zahiri daruruwan dubban dace sets shafe miliyoyin listings. Saboda haka, maimakon gwada karshe price ga dukan auctions cikin wani da aka ba lokacin da na fara price, Einav da kuma abokan aiki yi kwatancen cikin dace sets. Domin hada sakamakon daga kwatancen cikin wadannan dubban daruruwan dace sets, Einav da kuma abokan aiki sake bayyana da farawa price da karshe price cikin sharuddan da reference darajar kowane abu (misali, da talakawan sale price). Alal misali, idan Taylormade kuka 09 Driver yana da reference darajar $ 100 (bisa da tallace-tallace), to, a lokacin da na fara farashin $ 10 za'a bayyana a matsayin 0.1 da karshe farashin $ 120 za a bayyana matsayin 1.2.

Figure 2.6: An misali da wani dace sa. Wannan shi ne ainihin wannan golf club (a Taylormade kuka 09 Driver) ake sayar da ainihin wannan mutum (budgetgolfer), amma wasu daga cikin wadannan tallace-tallace da aka yi yanayi daban-daban (msl, daban-daban farawa price). Figure dauka daga Einav et al. (2015).

Figure 2.6: An misali da wani dace sa. Wannan shi ne ainihin wannan golf club (a Taylormade kuka 09 Driver) ake sayar da ainihin wannan mutum ( "budgetgolfer"), amma wasu daga cikin wadannan tallace-tallace da aka yi yanayi daban-daban (misali, daban-daban farawa price). Figure dauka daga Einav et al. (2015) .

Ka tuna cewa Einav da kuma abokan aiki kasance sha'awar da sakamako na farko price a gwanjo sakamakon. Na farko, ta yin amfani da mikakke komawa da baya suka kiyasta cewa hakan farawa farashin rage yiwuwar sayar, da kuma cewa mafi girma farawa farashin kara karshe sale price, matukar a kan wani sale faruwa. By kansu, wadannan kimomi-wanda aka kaddarance a kan dukan kayayyakin da ya ɗauka a mikakke dangantaka tsakanin farawa price da na karshe sakamakon-ba abin da ban sha'awa. Amma, Einav da kuma abokan aiki kuma yi amfani da m size da bayanai zuwa kimanta da dama more dabara binciken. Na farko, da kuma abokan aiki Einav sanya wadannan kimomi dabam ga abubuwa daban-daban farashin da kuma ba tare da yin amfani da mikakke komawa da baya. Suka gano cewa yayin da dangantaka tsakanin fara price da yiwuwar wani sayarwa ne mikakke, da dangantakar dake tsakanin farawa price kuma sayarwa price ne a fili ba mikakke (Figure 2.7). Musamman ma, domin lokacin da na fara farashin tsakanin 0.05 da kuma 0,85, da farawa price yana da matukar tasiri a kan kadan sale price, a binciken da aka kammala da aka rasa a cikin analysis cewa ya zaci wani mikakke dangantaka.

Adadi 2.7: Relationship tsakanin gwanjo fara price da yiwuwar wani sale (hagu panel) da kuma sale price (dama panel). Akwai wajen da mikakke dangantaka tsakanin fara price da yiwuwar sale, amma akwai wadanda ba mikakke dangantaka tsakanin fara price kuma sayarwa price. domin lokacin da na fara farashin tsakanin 0.05 da kuma 0,85, da farawa price yana da matukar tasiri a kan kadan sale price. A lokuta biyu, da dangantaka ne m zaman kanta abu darajar. Wadannan jadawalai haifa siffa 4A da 4B Einav et al. (2015).

Adadi 2.7: Relationship tsakanin gwanjo fara price da yiwuwar wani sale (hagu panel) da kuma sale price (dama panel). Akwai wajen da mikakke dangantaka tsakanin fara price da yiwuwar sale, amma akwai wadanda ba mikakke dangantaka tsakanin fara price kuma sayarwa price. domin lokacin da na fara farashin tsakanin 0.05 da kuma 0,85, da farawa price yana da matukar tasiri a kan kadan sale price. A lokuta biyu, da dangantaka ne m zaman kanta abu darajar. Wadannan jadawalai haifa siffa 4A da 4B Einav et al. (2015) .

Na biyu, maimakon averaging a kan dukan abubuwa, Einav da kuma abokan aiki kuma yi amfani da m sikelin da bayanai zuwa kimanta tasiri na fara price for 23 daban-daban Categories abubuwa (misali, Pet kayayyaki, Electronics, da kuma wasanni abubuwan) (Figure 2.8). Wadannan kimomi nuna cewa don ƙarin rarrabe abubuwa-kamar tsaraba-farko price yana karami sakamako a kan yiwuwar sayar da wani ya fi girma sakamako a karshe sale price. Bugu da ari, don ƙarin commodified abubuwa-kamar DVDs da video-farkon price yana kusan babu tasiri a karshe price. A wasu kalmomin, da matsakaita da hadawa da sakamakon daga 23 daban-daban Categories abubuwa boyewa muhimmanci bayani game da bambance-bambance tsakanin wadannan abubuwa.

Adadi 2.8: Results nuna kimomi daga kowace category akayi daban-daban. m dot a kimanta ga dukan Categories sun tattara ganimar Asusun tare, Table 11 (Einav et al. 2015, Table 11). Wadannan kimomi nuna cewa don ƙarin rarrabe abubuwa-kamar tsaraba-farkon price yana karami sakamako a kan yiwuwar a sale (x-axis) da kuma wani ya fi girma sakamako a karshe sale price (y-axis).

Adadi 2.8: Results nuna kimomi daga kowace category akayi daban-daban. m dot a kimanta ga dukan Categories sun tattara ganimar Asusun tare (Einav et al. 2015, Table 11) . Wadannan kimomi nuna cewa don ƙarin rarrabe abubuwa-kamar tsaraba-farkon price yana karami sakamako a kan yiwuwar a sale (x-axis) da kuma wani ya fi girma sakamako a karshe sale price (y-axis).

Ko kun kasance ba musamman sha'awar auctions a eBay, dole ka sha'awan hanyar da adadi 2.7 da kuma adadi 2.8 tayin a aukaka fahimtar eBay fiye da sauki mikakke komawa da baya kimomi cewa zaton mikakke dangantaka da kuma hada mutane da yawa daban-daban Categories abubuwa. Wadannan more dabara kimomi kwatanta ikon matching a m data. wadannan kimomi dã ya kasance ba zai yiwu ba, ba tare da wani babban yawan filin gwaje-gwajen, wanda zai kasance prohibitively tsada.

Hakika, ya kamata mu da kasa amincewa da sakamakon na kowane musamman matching binciken daga gare mu za a sakamakon wani m gwaji. A lokacin da kimantawa da sakamakon daga duk wani matching binciken, akwai biyu muhimmanci damuwa. Na farko, dole mu tuna cewa za mu iya kawai tabbatar da gaskiya kwatancen a kan abubuwan da aka yi amfani da matching. A cikin babban sakamako, Einav da kuma abokan aiki bai exact matching hudu halaye: sayarwa ID number, abu category, abu take, kuma subtitle. Idan abubuwa suka daban-daban a hanyoyi da aka ba amfani matching, da zai iya haifar da wani m kwatanta. Alal misali, idan "budgetgolfer" saukar da farashin Taylormade kuka 09 Driver a cikin hunturu (lokacin da golf clubs ne m m), to, shi zai iya bayyana cewa ƙananan farawa farashin kai su runtse karshe farashin, a lokacin da a gaskiya wannan zai zama da mutum ke sanya na yanayi bambancin a bukatar. A general, mafi kusanta ga wannan matsala alama da za a} o} arin da yawa daban-daban iri iri daya. Alal misali, abokan aiki Einav kuma maimaita su analysis inda dace sets hada abubuwa a kan sale cikin shekara guda, a cikin wata daya, kuma contemporaneously. Yin lokacin da taga tighter rage-rage yawan dace sets, amma rage damuwa game da yanayi bambancin. Abin farin, suka ga cewa sakamakon canzawa da wadannan canje-canje a matching sharudda. A cikin matching wallafe-wallafe, da irin wannan damuwa ne yawanci bayyana cikin sharuddan observables da unobservables, amma key ra'ayin ne da gaske cewa masu bincike ne kawai da samar da adalci kwatancen a kan siffofin amfani a daidai.

Na biyu babbar damuwa a lokacin da fassara matching sakamakon shi ne cewa su kawai shafi dace data. ba su amfani da lokuta da ba za a iya dace. Alal misali, da iyakance su gudanar da bincike don abubuwa da cewa yana da mahara listings Einav da kuma abokan aiki da ake mayar da hankali a kan sana'a da kuma Semi-sana'a masu sayarwa. Saboda haka, a lõkacin da fassara wadannan kwatancen dole ne mu tuna cewa su ne kawai tambaya ga wannan tsarin cikin tsari na eBay.

Matching ne mai iko dabarun domin gano gaskiya kwatancen a cikin manyan datasets. Don mutane da yawa zamantakewa masana kimiyya, matching ji kamar biyu-mafi kyau ga gwaje-gwajen, amma da yake a imani cewa ya kamata a bita, dan kadan. Matching a m data iya zama mafi alhẽri daga wani karamin yawan filin gwaje-gwajen a lokacin da: 1) heterogeneity a effects da muhimmanci da kuma 2) akwai mai kyau observables for matching. Table 2.4 samar da wasu misalai na yadda matching za a iya amfani da babban data kafofin.

Table 2.4: Misalan karatu da suke amfani da matching sami gaskiya kwatancen cikin digital burbushi.
gudunmawata mayar da hankali Big data source lissafi
Effect of harbe-harbe a kan 'yan sanda da tashin hankali Tsaya-da-frisk records Legewie (2016)
Effect na Satumba 11, 2001 a iyalai da kuma makwabta zabe records da kuma kyauta records Hersh (2013)
Social contagion Sadarwa da samfurin tallafi data Aral, Muchnik, and Sundararajan (2009)

A ƙarshe, butulci hanyoyin kimantawa causal effects daga wadanda ba gwaji data kasance m. Duk da haka, dabarun domin yin causal kimomi kwance tare a maras iyaka daga karfi zuwa mafi raunin, kuma masu bincike za a iya samu m kwatancen cikin wadanda ba gwaji data. A ci gaba da yaushe-on, babban data tsarin qara mu ikon yadda ya kamata amfani da biyu data kasance hanyoyin: halitta gwaje-gwajen da matching.