2.3.2.3 Non-ummeli

Imithombo ezimbini non-ukumelwa kukho abantu ezahlukeneyo kwaye esetyenziswa ezahlukeneyo.

data Big bakholisa ukuba ngendlela zityekele ngeendlela ezimbini eziphambili. Oku akufuneki kubangela ingxaki zonke hlobo lokuvandlakanya, kodwa uhlalutyo elithile kunokuba isiphako kakhulu.

Umthombo lokuqala icala ngendlela kukuba abantu wathimba bakutshelwe aluyi iphela epheleleyo yabantu bonke okanye random sample nakweyiphi abemi ethile. Umzekelo, baseMerika on Twitter ayinguwe random sample of baseMerika (Hargittai 2015) . Umthombo yesibini icala ngendlela kukuba ezininzi iinkqubo data enkulu ukufaka iintshukumo, yaye abanye abantu negalelo iintshukumo ezininzi ngaphezu kwabanye. Ngokomzekelo, abanye abantu on Twitter negalelo amakhulu tweets amaxesha ngaphezu kwabanye. Ngoko ke, iziganeko kwiqonga ethile kunokuba kuwathi kakhulu ngakumbi kubonakalisa subgroups ezithile ngaphezu eqongeni ngokwayo.

Ngokwesiqhelo abaphandi bafuna ukwazi lukhulu malunga data ukuba. Kodwa, banikwa uhlobo non-ummeli data ezinkulu, oko kuluncedo kwakhona flip ocinga ngayo. Kufuneka kwakhona ukwazi lukhulu malunga data ukuba awunayo. Oku kuyinyani ngakumbi xa data ukuba awunalo zahluke ngendlela ukusuka data ukuba awunayo. Umzekelo, ukuba iirekhodi call evela kwinkampani ifowuni ephathwayo kwindawo kumazwe asaphuhlayo, kufuneka ucinge nje malunga nabantu dataset bakho, kodwa malunga abantu kunokuba bangamahlwempu kakhulu ukuba ngabanini umnxeba wesandla. Ngaphezu koko, kwiSahluko 3, siza kufunda indlela weighting ungenza abaphandi ukwenza uqikelelo ngcono ukusuka data non-ummeli.