Home News in English Policy Research Working Paper 8735 Estimation of Poverty in Somalia Using Innovative...

Policy Research Working Paper 8735 Estimation of Poverty in Somalia Using Innovative Methodologies Utz Pape Philip Wollburg


Poverty and Equity Global Practice
February 2019
Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized
Produced by the Research Support Team
Abstract
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Policy Research Working Paper 8735
Somalia is highly data-deprived, leaving policy makers
to operate in a statistical vacuum. To overcome this challenge,
the World Bank implemented wave 2 of the Somali
High Frequency Survey to better understand livelihoods
and vulnerabilities and, especially, to estimate national
poverty indicators. The specific context of insecurity and
lack of statistical infrastructure in Somalia posed several
challenges for implementing a household survey and measuring
poverty. This paper outlines how these challenges
were overcome in wave 2 of the Somali High Frequency
Survey through methodological and technological adaptations
in four areas. First, in the absence of a recent census,
no exhaustive lists of census enumeration areas along with
population estimates existed, creating challenges to derive
a probability-based representative sample. Therefore, geospatial
techniques and high-resolution imagery were
used to model the spatial population distribution, build
a probability-based population sampling frame, and generate
enumeration areas to overcome the lack of a recent
population census. Second, although some areas remained
completely inaccessible due to insecurity, even most accessible
areas held potential risks to the safety of field staff and
survey respondents, so that time spent in these areas had
to be minimized. To address security concerns, the survey
adapted logistical arrangements, sampling strategy using
micro-listing, and questionnaire design to limit time on
the ground based on the Rapid Consumption Methodology.
Third, poverty in completely inaccessible areas had to
be estimated by other means. Therefore, the Somali High
Frequency Survey relies on correlates derived from satellite
imagery and other geo-spatial data to estimate poverty in
such areas. Finally, the nonstationary nature of the nomadic
population required special sampling strategies.
This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide
open access to its research and make a contribution to development policy discussions around the world. Policy Research
Working Papers are also posted on the Web at http://www.worldbank.org/research. The authors may be contacted at
upape@worldbank.org.
Estimation of Poverty in Somalia Using Innovative
Methodologies
Utz Pape and Philip Wollburg1
Keywords: Consumption Measurement, Poverty, Questionnaire Design
JEL: C83, D63, I32
1 Authors in alphabetically order. Corresponding author: Utz Pape (upape@worldbank.org). The findings,
interpretations and conclusions expressed in this paper are entirely those of the authors, and do not necessarily
represent the views of the World Bank, its Executive Directors, or the governments of the countries they
represent. Gonzalo Nunez contributed to the survey design and poverty analysis and provided inputs to this
manuscript. The authors would like to thank Kristen Himelein and Wendy Karamba for discussions. In addition, the
authors thank Véronique Lefebvre, Sarchil Qader, Amy Ninneman, Dana Thomson and Tom Bird from Flowminder
and WorldPop for designing the population sampling frame using quadtrees and producing fieldwork maps, and
for modelling and imputing poverty from spatial data, in collaboration with the authors.
2
1. Introduction and related literature
Somalia gained independence in 1960. The collapse of Siad Barre’s post‐independence regime in 1991 led
to civil war between local power factions and dismantled the central state completely. Between 1995 and
2000, regional administrations emerged across the country, as security improved and economic
development accelerated.2 The formation of the Transitional Federal Government in 2004 and of its
successor, the Federal Government of Somalia, in 2012 marked the return of a significant central state
institution. After peaceful elections in 2016, a new government was formed in 2017 committed to embark
on a development trajectory (World Bank, 2017).
Though Somalia remains one of the world’s poorest countries (World Bank, 2016a, 2015), a vibrant but
largely informal private sector sprouted in the absence of government, drove growth in the Somali
economy, and took on the provision of services. Several economic activities including
telecommunications, money transfer businesses, livestock exports, and localized electricity services grew
well during this period (World Bank, 2017). Large‐scale out‐migration of skilled Somalis who sent back
part of their earnings made diaspora remittances essential to the Somali economy, equivalent to between
23 and 38 percent of GDP and outweighing both international aid flows and foreign direct investment
(World Bank, 2015).
Despite improvements in political stability, Somalia remains fragile. Parts of southern Somalia are
inaccessible due to the presence of Al‐Shabaab, which also repeatedly carried out terroristic attacks, and
violent clashes between various power factions continue to occur throughout the territory.3 In addition
to conflict, the cyclical El Nino phenomenon caused severe droughts in 1991/92, 2011/12, and 2016/17
which exacerbated preexisting vulnerabilities in the Somali population. Both conflict and drought have led
to large‐scale internal displacement (World Bank, 2018a). The recent 2016/17 drought led to the
displacement of approximately one million Somalis, adding to an existing population of internally
displaced persons of 1.1 million (UNHCR, 2018).
As is typical for fragile states, Somalia is highly data‐deprived, leaving policy makers to operate in a
statistical vacuum (Beegle et al., 2016). Specifically, years of civil war and ongoing conflict have eroded
Somalia’s statistical infrastructure and capacity, leading to the lack of key macro‐ and micro‐economic
indicators, including the poverty rate (Hoogeveen and Nguyen, 2017). The government conducted and
published the last full population census in 1975, while Somalia Socioeconomic Survey of 2002 was the
last country‐wide household survey (UNFPA, 2014). Most recent existing data sources are local FSNAU
and FAO food and nutrition surveys, while organizations operating within Somalia implemented a range
of smaller surveys. In 2014, UNFPA implemented the first nationwide Population Estimation Survey (PESS)
in preparation for a national census, finding the total population to be 12.3 million, of which 42 percent
are urban, 23 percent rural, 26 percent nomadic, and 9 percent are internally displaced (UNFPA, 2014).
Funded by the World Bank, Somaliland carried out a household budget survey (SLHS) in 2013, which
generated much‐needed indicators, including poverty estimates, but the sample was not representative
especially for the rural population and did not cover the nomadic and displaced populations. The World
Bank conducted the first wave of the Somali High Frequency Survey (SHFS) in the spring of 2016,
representative of the accessible urban, rural, and IDP population in 9 of 18 prewar regions as well as
2 Somaliland self‐declared independence in 1991.
3 See the Armed Conflict Location and Events Database (ACLED), Somalia, for a disaggregated overview.
3
Mogadishu, providing a baseline data set for monitoring poverty and contributing to other key statistical
indicators. However, in addition to large inaccessible areas, the sample excluded nomadic population and
households in insecure areas. Furthermore, the rural sampling frame had to be derived ad‐hoc with only
limited representativeness. Wave 2 of the SHFS, implemented in December of 2017, significantly
expanded coverage to urban and rural areas in central and southern Somalia and included the nomadic
population for the first time, while a newly derived sampling frame enhanced overall representativeness.
The specific context of insecurity and lack of statistical infrastructure in Somalia posed a number of
challenges for implementing a household survey and measuring poverty. First, in the absence of a recent
census, no exhaustive lists of census enumeration areas along with population estimates existed, creating
challenges to derive a probability‐based representative sample. Second, while some areas remained
completely inaccessible due to insecurity, even most accessible areas held potential risks to the safety of
field staff and survey respondents, so that time spent in these areas had to be minimized. Third, poverty
in completely inaccessible areas had to be estimated by other means. Finally, the non‐stationary nature
of the nomadic population required special sampling strategies. This paper outlines how these challenges
were overcome in wave 2 of the SHFS through methodological and technological adaptations in four
areas: sampling strategy, survey design, fieldwork implementation, and poverty measurement. In line
with the challenges outlined above, this paper contributes to several themes in the literature on poverty
measurement and data collection in the context of conflict and fragility, involving hard‐to‐survey
populations.
First, geospatial techniques and high‐resolution imagery were used in the SHFS to model the spatial
population distribution, build a probability‐based population sampling frame, and generate enumeration
areas in an effort to overcome the lack of a recent population census (section 2). The SHFS sampling
strategy bears resemblance to the strategy proposed by Muñoz and Langeraar (2013), which relies on
satellite imagery and grid cells to build a sampling frame in Myanmar. Wardrop et al. (2018) review various
efforts to produce spatially disaggregated population estimates based on satellite imagery, in contexts
where census data are absent or inaccurate. Barry and Rüther (2005) and Turkstra and Raithelhuber
(2004) employ satellite imagery to study informal urban settlements in South Africa and Kenya,
respectively, while Aminipouri et al. (2009) estimate various slum populations in Dar‐es‐Salaam, Tanzania.
Himelein et al. (2016) compare the viability of various satellite and area‐based sampling methods in
second‐stage sample selection in Mogadishu, Somalia.
Second, risks to the safety of field staff required spending as little time in enumeration areas as possible.
One strategy to address this issue is to call or message respondents on their mobile phones and not visit
dangerous areas at all. A growing body of literature explores the use of mobile technology in this context
(e.g. Demobynes and Sofia, 2016; Dillon, 2012; Firchow and Mac Ginty, 2016). However, administration
of necessary consumption modules to estimate poverty is not feasible via phone surveys.
To address security concerns, the SHFS adapted logistical arrangements, sampling strategy, and
questionnaire design to limit time on the ground. In logistical arrangements, a detailed and timely security
assessment ensured that the enumeration areas to‐be‐visited were safe on the day of fieldwork. The
fieldwork protocol was designed such that teams would spend as little time as possible in any given region
and draw little attention, ensuring enumerator and respondent safety (section 3.2). Concerning sampling
strategy, it was not feasible to conduct a full listing of all households in an enumeration area, as this was
too time‐intensive and may have raised suspicion. Instead, a micro‐listing approach was used, which
4
required enumeration areas to be segmented into smaller enumeration blocks using satellite imagery.
Enumeration blocks are small enough for enumerators to list and select households immediately before
conducting the interview (section 2.2). Himelein et al. (2016) compare this methodology with other
second‐stage sampling strategies designed for use in fragile and time‐sensitive settings.
Complete food and nonfood consumption modules result in an overall questionnaire length that is
prohibitive in areas with high insecurity. The length of consumption modules can be reduced by removing
rarely consumed items from the module or to combine categories of items (e.g. vegetables) and ask
aggregates rather than individual items. Beegle et al. (2012) and Olson Lanjouw and Lanjouw (2001)
provide evidence that both approaches lead to an underestimation of consumption and hence an
overestimation of poverty. Fujii and Van der Weide (2013) propose an alternative approach which could
be adapted for use in fragile settings, by assigning a full consumption module to households in areas
without a binding security and time constraint, with only the covariates of consumption administered to
households in insecure areas. Consumption and poverty could then be imputed based on those covariates.
This approach, however, potentially leads to biases as the assignment of the two different modules
depends on security and is not necessarily random. Instead, the Rapid Consumption Methodology (Pape
and Mistiaen, 2018) was used to significantly reduce the length of the survey’s consumption modules.
The Rapid Consumption Methodology used in the SHFS relies on a set of core consumption items
administered to all households. The remaining items are algorithmically partitioned into optional
modules distributed systematically across households, with multiple imputation techniques used to
impute total consumption and poverty. Pape and Mistiaen (2018) show that this design yields reliable
poverty estimates (section 4).
Third, the SHFS relies on correlates derived from satellite imagery and other geo‐spatial data to estimate
poverty in areas that remained completely inaccessible as a result mainly of insecurity. A growing field of
research is dedicated to predicting a range of outcomes based on a diverse set of such data sources. Early
applications use night‐time lights data to predict economic activity. These data are particularly successful
at predicting GDP at the country‐level (Henderson et al., 2012; Pinkovskiy and Sala‐i‐Martin, 2016), but
appear less well‐suited for measuring income and when variation in welfare is desired at a highly
disaggregated level (Engstrom et al., 2017; Mellander et al., 2015). More recently, deep learning
techniques applied to daytime imagery in order to classify such objects as roof types, roads, tree coverage,
and crops has led to advances in measuring welfare at more disaggregated levels (Krizhevsky et al., 2012).
Jean et al. (2016) use a convolutional neural network based on daytime satellite features to predict per
capita consumption at the level of the enumeration area from living standards measurement surveys.
Their model is successful in predicting consumption and explains 46 percent of variation on average across
four countries and out‐of‐sample. Engstrom et al. (2017) provide a recent overview of the state of the
literature and use high‐resolution satellite features to estimate poverty at the village‐level. In the SHFS,
estimating poverty in inaccessible areas relied on a linear model with the objective of creating reliable
and transparent poverty measures (section 5).
The remainder of this paper proceeds as follows. Section 2 discusses the sampling strategy. Section 3
provides an overview of the data collection process. Section 4 describes the derivation of the consumption
aggregate, including the Rapid Consumption Methodology. Section 5 describes the imputation of poverty
in inaccessible areas, and section 6 gives an overview of poverty in Somalia.
5
2. Sampling strategy
Wave 2 of the SHFS employed a multi‐stage stratified random sample, ensuring a sample representative
of all sub‐populations of interest, while optimally balancing cost and precision of estimates. Strata were
defined along two dimensions – administrative location (pre‐war regions and emerging states) and
population type (urban areas, rural settlements, IDP settlements, and nomadic population), leading to a
total of 57 strata (Table A.2). Sub‐populations in the urban centers of Mogadishu, Baidoa, and Kismaayo,
in fisheries livelihood zones in coastal areas (Figure A.1), and IDP host communities were of particular
interest and therefore deliberately oversampled.
The total planned sample size was 6,384 interviews, allowing for high‐precision consumption estimates
with less than 10 percent relative standard errors for key sub‐populations and overall. The sample was
allocated across strata following optimal (Neyman) allocation, minimizing the global sampling error of the
consumption estimates (Neyman, 1934). Optimal allocation is given by
(1)

Σ

,
where is the sample size in stratum h, n is the total sample size, H is the total number of strata, is
the total population of stratum h, N is the total overall population, and is the standard deviation in
stratum h. Hence, the number of households to be interviewed per stratum is mainly determined by the
variability of consumption within the stratum (). was derived from the results of the SHFS Wave 1.
The population size only matters for practical purposes in very small strata below 10,000 households. In
the absence of a recent population census, the population of each stratum was derived from UNFPA’s
2014 Population Estimation Survey (PESS), which contains detailed estimates for each population type
and administrative unit of interest.
The optimal allocation of interviews was subject to the following requirements:
(i) 500 expected interviews in IDP settlements and 500 in nomadic populations;
(ii) At least 600 interviews expected per administrative unit;
(iii) Oversampled populations with
‐ Mogadishu (urban): 900 interviews, including IDPs;
‐ Kismaayo (urban) and Baidoa (urban): at least 500 interviews each;
‐ Coastal fisheries livelihood zones: at least 300 interviews;
‐ IDP host communities: 500 expected interviews.
Households are clustered into enumeration areas (EAs), with 12 interviews expected for each selected EA.
A larger number of households per enumeration area would only marginally benefit the statistical
estimation of indicators because of potential homogeneity among households in geographic proximity. A
smaller number of households would result in fewer than 3 observations for each of the four optional
modules capturing household consumption based on the Rapid Consumption Methodology, and thus
affect the reliability of poverty estimates (see section 4).
The sampling design addressed the challenging security situation on the ground in two ways. First, a
security assessment was conducted to exclude areas too dangerous for field teams to visit. Second, a
micro‐listing approach was used in second‐stage sample selection to allow field teams to spend limited
time on the ground. Replacement of sampling units during fieldwork followed a transparent and
6
predefined replacement schedule, which was necessary to correctly calculate sampling weights (see
Appendix on Replacement of sampling units).
2.1. Incorporating inaccessibility into the sampling frame
A geo‐spatial access map depicting accessibility (Figure 1) was created through key informant interviews
with security experts and regional fieldwork coordinators based in the field. Publicly available information
and incident reports provided by a local security company were used as auxiliary inputs. Finally, the
information in the access map was triangulated with security analysts from a security NGO and private
security company.
Figure 1: Security assessment access map
Source: Authors’ own calculations.
Note: Red color indicates inaccessibility, green color indicates accessibility. Circles represent urban centers.
The security assessment led to the complete exclusion of pre‐war region Middle Juba. Several other prewar
regions in south and central Somalia were only partially accessible (Table 1). The security situation
differed substantially between different cities, with some completely inaccessible and some at least
partially accessible even though they were located in insecure regions. The sampled IDP and nomadic
populations fell within safe areas. Survey estimates for these populations were thus considered to be
representative.
7
Table 1: Accessibility rates by pre‐war region.
Pre‐war region
Percentage of population in
accessible areas
Urban areas Rural areas
Awdal 100% 94%
Bakool 35% 21%
Banadir 87% 96%
Bari 99% 92%
Bay 86% 46%
Galgaduud 88% 50%
Gedo 100% 43%
Hiraan 44% 28%
Lower Juba 92% 9%
Lower Shabelle 28% 33%
Middle Juba 0% 0%
Middle Shabelle 98% 77%
Mudug 100% 76%
Nugaal 100% 100%
Sanaag 100% 100%
Sool 89% 98%
Togdheer 100% 98%
Woqooyi Galbeed 100% 96%
Overall 89% 48%
Source: Authors’ calculations
Low accessibility in south and central Somalia motivated the imputation of poverty in inaccessible areas
using geo‐spatial information (section 5). The accessibility map was incorporated into the sampling frame
to draw EAs only from accessible areas. The resulting sample was thus representative of the entire Somali
population within secure areas.
2.2. Sampling frame and sample selection
The sampling frame for wave 2 of the SHFS is the exhaustive list of sampling units for every stage in the
multi‐stage selection process (denominated according to the stage of selection, i.e. primary sampling units
(PSUs) in the first stage, secondary sampling units (SSUs) in the second stage, and so on) employed in the
survey’s sampling strategy. Sampling units are listed separately by stratum. Each sampling unit must have
information concerning the population residing in it to allow for selection proportional to size (United
Nations Statistical Division, 2005). In the absence of a recent population census, no readily useable
enumeration areas and population estimates existed. To overcome these challenges the SHFS drew from
a variety of data sources and GIS techniques to create a population sampling frame, strata boundaries,
and a comprehensive list of enumeration areas.
Strata boundaries
In line with stratification at the intersection of administrative regions and population type, the following
GIS data sets were combined to spatially demarcate strata boundaries:
(i) Pre‐war region boundaries;
(ii) IDP settlement boundaries;
8
(iii) Urban area boundaries;
(iv) Rural settlement boundaries;
(v) Security assessment access map.
Pre‐war region boundaries are available as shapefiles from UNDP. The boundaries of urban areas were
defined by the urban enumeration areas previously used in UNFPA’s Population Estimation Survey 2014
(PESS). Boundaries of IDP settlements were provided by UNHCR’s Shelter Cluster and PESS. The IDP strata
boundaries were subtracted from the urban and rural strata to prevent duplicate sampling. The remaining
areas outside of the urban and IDP strata were considered as rural strata. Areas determined too dangerous
through the security assessment were removed from the sampling frame.
Population sampling frame
In urban and rural strata, population estimates were derived from the 2015 WorldPop data set, detailed
in Linard et al. (2010). This data set uses a combination of data sources and methods, including satellite
imagery, to derive highly spatially disaggregated population estimates. First, the starting point are 2005
population estimates at the district‐level from the UN Office for the Coordination of Humanitarian Affairs
(OCHA) for 74 districts. Second, an Africover GIS dataset depicting 22 landcover classes was combined
with 2005 Landsat satellite imagery depicting settlement outlines. Third, settlement point location data,
based on the efforts of various NGOs and UN agencies, with more than 11,000 settlement points along
with some population estimates, including urban and rural areas, and IDP settlements. To achieve higher
spatial resolution, the OCHA estimates were disaggregated using the information contained in the
settlement points data and the landcover class data. The result is a gridded population dataset at 100mby‐
100m spatial resolution. For each 100m‐by‐100m cell, the data set contains a population estimate,
which, aggregated within the PSU, provides a population estimate for each primary sampling unit (PSU)
in urban and rural strata, which was later used for sample selection proportional to size. Due to
inadequacies of the population density map for the purpose of creating a sampling frame, a set of
corrections had to be made to this data set. In the original WorldPop layer, the population values were
not always distributed smoothly. For instance, a village might have only one pixel with a high population
number creating a sharp contrast, although its coverage area is larger and the transition from sparse to
dense population is more progressive. Hence, a Gaussian smoothing kernel technique with standard
deviation of 500m was applied. This distributed higher values smoothly in areas surrounding a highdensity
pixel while preserving close to the total population count in the area (Figure 2).
9
Figure 2: Gaussian smoothing of the WorldPop population density layer.
Source: Flowminder / WorldPop.
In IDP strata, population estimates for each PSU were based on the PESS IDP population estimates for
each pre‐war region.
Primary Sampling Units (PSUs) and first‐stage sample selection
PSUs were generated using a variety of techniques depending on the population type. The primary
sampling unit (PSU) in urban as well as rural strata was the enumeration area (EA). The boundaries for
urban EAs were derived from the enumeration areas used in UNFPA’s 2014 Population Estimation Survey
(PESS). Overlaps with IDP settlements were removed. The EAs thus obtained were combined with the
corresponding population estimates from the 100m‐by‐100m WorldPop data set to form the sampling
frame for urban strata in which each PSU has a positive and known probability of selection. In case a strata
boundary cut through any grid cell in the WorldPop data set, the grid cell was split and the population
estimates re‐calculated weighted by geographical area. In rural strata – defined as those permanently
settled areas outside of urban areas and IDP settlements – no list of enumeration areas comparable to
the PESS EAs exists. The entire area of rural strata assessed as secure was divided into rectangular grid
cells of different sizes using a quadtree algorithm. The approach splits an area into successively smaller
quadtratures by checking to see whether the content of each split is greater or less than a prescribed
value. In this case, the population map was used as the unit of measure, and was split successively until
each square had a population of less than a target population of 3,500. This approach also allowed the
definition of each grid cell per a set of combined parameters, specifically geographic extent and
population size (Figure 3; see Minasny et al., 2007). Thus, each cell has a minimum estimated population
10
size of 3,500 and a maximum geographical area of 3 km x 3 km. to keep enumeration areas manageable
in size for field teams.
Figure 3: Quadtree grids
Source: Flowminder / WorldPop.
For IDP strata, primary sampling units were IDP settlements as defined by UNCHR’s Shelter Cluster. PSU
boundaries, which, given the choice of PSU, are equivalent IDP settlement boundaries, were derived from
UNHCR’s GIS shapefiles. In several cases information on settlement boundaries was missing. In these
cases, the missing information was drawn from PESS IDP enumeration areas (Table A.1). PESS population
data, available at the pre‐war region level, were used to obtain population estimates for each IDP
settlement. To match the pre‐war region IDP population to each IDP settlement in the sampling frame,
the following protocol was applied: Whenever there was exactly one IDP settlement per pre‐war region,
the PESS IDP population for that pre‐war region was used as the population estimate for the respective
settlement, thus taking the settlement population as representative of the IDP population in its pre‐war
region. In cases where there is more than one settlement per pre‐war region, the PESS IDP population
was assigned to each settlement proportional to the geographical area each settlement covers relative to
the total area of all settlements in the given pre‐war region.
Across all strata, PSUs were selected using a systematic random sampling approach with selection
probability proportional to size (PPS), where size is given by the estimated population in each PSU. In PPS
sampling, PSUs are selected into the sample based on their size so that large PSUs have a greater chance
of being part of the final sample. In urban and rural areas, the EA served was the primary sampling unit.
11
In IDP strata, PPS sampling is applied at the IDP settlement level, determining how many enumeration
areas are to be selected in each settlement. PSUs were drawn separately for each stratum, with at least
20 percent additional PSUs selected to serve as replacements in case one of the main PSUs needed to be
replaced (see section 2).
Secondary sampling units (SSUs) and second‐stage sample selection
Even in areas deemed accessible per the security assessment, it was critical to the safety of field staff and
respondents that teams would spend as little time as possible in each EA. Himelein et al. (2016) discuss
and compare several second‐stage sample selection strategies for use in contexts such as this one. In wave
2 of the SHFS, a micro‐listing approach is used in second‐ and final‐stage sample selection. In micro‐listing,
enumeration areas are divided into smaller enumeration blocks. Rather than performing a timeconsuming
full listing of all households in the EA, enumerators list only households in one enumeration
block, then select the household to be interviewed, and immediately conduct the interview, greatly
reducing the time required in the EA.
Enumeration blocks were generated through different means, depending on the population type. In urban
and rural strata, the EAs selected in the first stage were manually segmented into enumeration blocks
(EBs) using satellite imagery from Google Earth or Bing, counting the number of structures visible in each.
Enumeration blocks served as secondary sampling units (SSUs) in the sampling design. Enumeration blocks
were created as per the following general criteria:
 Each selected EA would be comprehensively covered by enumeration blocks.
 Each EA would be delineated into 12 enumeration blocks, expecting one interview per block.
 Each enumeration block would contain at least 1 and at most 12 structures.
 Enumeration blocks in the same EA should have roughly the same number of visible
structures.
 Blocks would be drawn to take account of natural boundaries.
 Each block should have a central point from which all structures in the block can be seen.
The general criteria for block delineation allow for several special cases:
(i) If any PSU contained fewer than 12 structures, it would not be possible to delineate 12 blocks
of the same size.
(ii) If any PSU contained more than 150 structures, more than 12 blocks were delineated,
following the above criteria.
(iii) Given the design features of the sample, a fraction of PSUs was selected more than once. This
occurred in two instances: First, given the nature of the first‐stage sample selection with PPS,
very large PSUs were selected twice or three times. This was especially likely in strata with a
relatively short list of PSUs and a relatively large number of required interviews in the
stratum. Second, as outlined in the previous section, PSUs were selected more than once if
they formed part of one of the oversamples. The number of required interviews and
consequently the required number of enumeration blocks was scaled up proportionately in
these cases. For instance, if a PSU was selected twice, 12*2=24 interviews and blocks were
12
required, and if a PSU was selected three times, 12*3=36 interviews and blocks were
required. All other criteria for block delineation remained in place (Figure 4).
Figure 4: Example of EA delineated into blocks
Source: Flowminder / WorldPop.
Enumeration blocks were selected with equal probability. In the general case of 12 blocks per
enumeration area, every single block was selected as 12 interviews per EA were required (and equivalently
for PSUs with 24 or 36 required interviews in special case (iii)). In PSUs where more than 12 (or 24, or 36)
blocks had been delineated due to the high number of visible structures (special case (ii)), selection of 12
(or 24, or 36) blocks with equal probability was implemented using equal probability random sampling. In
PSUs with less than 12 (or 24, or 36) visible structures (special case (i)), two selection mechanisms were
possible: First, if field teams found that there were indeed less than 12 structures in the PSU (as the
satellite imagery suggested), all structures were interviewed. Second, when field teams found that the
number of structures was higher than the satellite imagery suggested, enumerators counted the number
of structures and randomly selected 12 (or 24, or 36) households to be interviewed with equal probability.
A similar second‐stage sampling strategy was employed for IDP strata. Each IDP settlement was
segmented manually into enumeration blocks with approximately 10 structures each. Where sensible, 12
enumeration blocks were combined into one enumeration area. In some cases, however, IDP settlements
consisted of geographically dispersed pockets within urban areas, each far away from the next. To keep
enumerator travel time in check, facilitate supervision, and ensure safety, the construction of IDP EAs
followed these geographical contingencies to some extent. Hence, some EAs were created to contain
more than 12 blocks and others contained less than 12.
Several of the most recent IDP settlement boundaries provided by UNHCR were a few years old, while the
recent drought caused perturbations to the size, composition, and localization of the IDP population. Thus,
each selected IDP enumeration area was inspected to ensure that it was still inhabited by displaced
13
communities. This led to several IDP EAs being dropped and replaced by backup EAs. Enumeration areas
served as secondary sampling units and were selected with probability proportional to size, with size given
by the number of blocks per EA. The required number of EAs in each IDP settlements was fixed through
first‐stage sample selection. Then, where there were more or fewer than 12 blocks per IDP EA, blocks
were selected with equal probability.
Final‐stage sample selection: Households
Except for the special cases discussed in the previous sections, enumerators were expected to interview
one household per block in all selected blocks within the enumeration area. The household was selected
randomly with equal probability in two stages, following the micro‐listing protocol: From a central point
in the block, the enumerator listed all residential structures within the current block into the tablet. The
enumerator’s tablet then randomly selected a residential structure for the enumerator to visit. At the
structure, the enumerator recorded the number of households residing in the structure, and the tablet
again randomly selected a household to be interviewed.
Oversamples
For Baidoa, Kismaayo, and fisheries areas, a second‐stage oversampling strategy was used. In secondstage
oversampling, PSUs selected in the first stage and falling into the specified urban centers or coastal
areas were selected again to reach the minimum sample size for each oversample. Through this process,
PSUs in Kismaayo were selected twice, and PSUs in Baidoa and in fisheries areas were selected a total of
three times. Fisheries livelihood zones in coastal areas were defined by FEWSNET and FSNAU (Figure A.1,
zones SO7 and SO8). For the host communities oversample, all urban enumeration areas adjacent to IDP
settlements were pre‐selected as a separate sampling frame. The resulting list was stratified implicitly by
pre‐war region. 42 enumeration areas were selected with probability proportional to size to reach the
desired oversample.
2.3. Sampling of the nomadic population
Nomadic households, who make up around a quarter of the Somali population according to UNFPA’s
Population Estimation Survey (PESS) of 2014, are inherently difficult to sample because, by definition, they
have no permanent place of residence (Kalsbeek, 1986; Soumare et al., 2007). Himelein et al. (2014) use
a random geographic cluster sample approach, in which points are randomly selected from a map and all
nomadic households within a radius around the point are interviewed. The SHFS followed a different
approach. The strategy for sampling nomadic households relied on lists of water points used by nomadic
households to water their livestock, which served as the primary sampling units. UNFPA’s 2014 PESS took
a similar approach to estimate the nomadic population (UNFPA, 2014). The SHFS project deployed 200
purpose‐designed tracking devices to nomadic households who gave consent, which track their
movements for two years. This will improve the understanding of the patterns of movement of the
nomadic population in Somalia, which will facilitate sampling this population in the future.
Nomadic sampling frame
Nomadic strata were defined at the federated member state level (Error! Reference source not found.),
with the population count for each stratum provided by PESS. The list of water points was divided up by
stratum. The list was put together from a combination of two sources. First, the list of water points used
in PESS. Second, a regularly updated list of water points kept by the UN Food and Agriculture Organization
(FAO). Given this combination of sources, the resulting list of water points used as sampling frame was
14
viewed to be close to or completely exhaustive. The list contained the GPS location and information on
type of water point (Berkad, Borehole, Dam, Dug Well, Spring, Other). Other water point characteristics
such as the number of households using the water point and the predominant type of cattle watered were
available only for an incomplete subset of water points. The list was stratified implicitly by pre‐war region
(each federated member state encompasses several pre‐war regions) and type of water point.
First‐stage sample selection
Water points from this list served as primary sampling units. In the absence of reliable estimates of the
population size of water points, 42 water points were selected in the first stage with equal probability,
with 12 interviews to be conducted at each selected water point. A further challenge in sampling nomadic
household peculiar to the timing of SHFS wave 2 was the ongoing drought, which led to many water points
having run dry. Therefore, a series of Key Informant Interviews (KIIs) and Focus Group Discussions (FGDs)
in each federated member state verified whether each selected water point was currently frequented by
nomadic households. In case a selected water point was not currently frequented by nomadic households,
it was replaced.
Selection of nomadic households at water points
Selection of nomadic households to interview relied on a listing process at each water point whose aim
was to compile an exhaustive list of all nomadic households at the water point. However, the total number
of nomadic households at a given water point is not static as nomadic households are not resident at
water points, but only stay there for a limited time, and arrive and leave at various times during the day.
It was determined in KIIs that nomadic households need to spend a very minimum of two hours at a given
water point to water their cattle and that cattle watering would occur during daylight hours. To allow for
a complete listing, daylight hours were segmented into two‐hour time slots, during each of which
enumeration team leaders completed a full listing of all nomadic households at the water point at that
time. As not all persons present at water points were members of nomadic households, but may instead
be from close‐by rural settlements, the listing form contained a number of questions identifying nomadic
households. The form also asked for informed consent to be interviewed. Upon completing a two‐hour
listing period, up to three households were randomly selected from the list of consenting nomadic
households gathered during this time slot. Interviews were then scheduled with the selected households
at a time and place convenient for the household respondent.4 Based on this sampling design, sampling
weights were calculated after the completion of data collection (see Appendix for the derivation Sampling
weights).
3. Data collection
Wave 2 of the Somali High Frequency Survey was implemented using computer assisted personal
interviewing (CAPI), whereby enumerators were equipped with tablet computers which contained the
survey questionnaire (section 3.1) and would upload completed interviews to the project’s Survey
4 Additional rules applied in special cases: (i) If no nomadic households were found or arrived at a given water point,
enumeration teams remained at the water point for three days. If no nomadic household had arrived, the water
point was replaced. (ii) If nomadic households were present but arrived at very low frequencies, so that teams
struggled to reach the required number of interviews, they would stay for a maximum of 12 days. Then teams would
leave whether or not they have reached the required number of interviews. If 3 or fewer (but at least 1) nomadic
households arrived during any two‐hour listing period, all listed households were interviewed during that period.
15
Solutions cloud servers daily. The choice of CAPI was guided, on the one hand, by the finding that this
technology greatly reduces the number of errors relative to pen‐and‐paper interviewing (e.g. Caeyers’s et
al., 2012). On the other hand, this technology was essential to the near real‐time monitoring of data
collection (section 3.2) and quality control (section 3.3), which were deemed necessary in the Somali
context where insecurity and remoteness make close supervision challenging and follow‐up visits costly.
3.1. Survey instrument
The consumption modules were the central components of the SHFS wave 2 survey questionnaire. The
questionnaire also contained other key components of a multi‐topic household survey, particularly those
relevant to the Somali context. These included an individual‐level module with information on education,
employment, and health, household characteristics, remittances, displacement, perceptions and
subjective welfare, and shocks. The questionnaire was designed in line with best practices (Deaton and
Grosh, 2000) and went through several iterations of internal and external expert revision.
The food consumption module consisted of 114 food items drawn from a list of CPI items provided by
statistical authorities. To meet the requirements of the Rapid Consumption Methodology (section 4),
items were divided in one core and four optional modules, with most commonly consumed items assigned
to the core module. The list of items was highly specific (e.g. apples, pears rather than fruits) and selected
to cover the basic food categories and adequately reflect the local diet (Smith et al., 2014; Zezza et al.,
2017). The list of food items contained various items for food away from home, accounting for both food
bought away from home and consumed at home and food consumed outside of the home. Further, to
facilitate food quantity reporting for respondents, a list of non‐standard units, along with their conversion
to kilograms, was developed for each item, with inputs from regional experts and experience from the
accompanying market price survey (Oseni et al., 2017).5 The questionnaire was designed to capture
purchased food, home production, and gifts.
The nonfood consumption module consisted of 90 items, which were assigned to core and optional
modules in the same manner as the food items. The choice of nonfood items followed the COICOP
classification system, with all relevant COICOP categories represented in the list of nonfood items.
3.2. Fieldwork and monitoring
The fieldwork strategy was designed to facilitate high‐quality data collection and safety of field teams.6
All enumerators and team leaders attended rigorous training sessions and had to sit a final exam to be
hired. Forty‐five teams were assembled for fieldwork, staffed each with one team leader, three regular
enumerators, and two reserve enumerators. The large number of teams was essential, on the one hand,
for security reasons. It allowed teams to enter and exit an area swiftly before their presence would draw
too much suspicion and endanger their safety and that of survey respondents. On the other hand, this
arrangement allowed teams to be composed of enumerators native to the areas which they covered.
The survey was piloted in each region before the beginning of fieldwork. Fieldwork was monitored in near
real‐time to verify data collection progress, data quality, and enumerator performance. To implement
5 The Market Price Survey (MPS) is a component of the SHFS. The MPS collects weekly exchange rates and prices of
a broad range of 91 products and services as well as exchange rates from 14 key markets across all Somali regions.
6 The Somali High Frequency Survey was implemented by Altai Consulting in coordination with the respective
statistical authorities. The team worked closely with the Directorate of National Statistics, Ministry of Planning,
Investment and Economic Development of the Federal Government of Somalia.
16
near real‐time monitoring, field teams uploaded interviews onto the project’s Survey Solutions server at
the end of each day. An automated pipeline of Stata code downloaded and processed the data, creating
a detailed monitoring dashboard in Microsoft Excel, which headquarters reviewed daily. The dashboard
tracked the number of submissions meeting the quality standards to be considered acceptable (see
section 3.3 for these standards), interview duration, and unit non‐response rates separately by EA,
enumerator, team, and strata. It further assessed item non‐response by listing the number of household
members, proportion of missing values, ‘No’, and ‘Don’t know’ or ‘Refused to respond’ entries in all
modules and several other key questions, which would trigger follow‐up questions. Unusually high
proportions of missing values, ‘No’, ‘Don’t know’, or ‘Refused to respond’ entries indicated possible
enumerator shirking, as this behavior would reduce enumerators’ workload. For example, entering that a
household self‐identifies as displaced would trigger an entire module on displacement. Enumerators
returning low‐quality or displaying suspicious behavior received warnings and follow‐up training. If the
issues could not be resolved in this way, enumerators were replaced by reserve enumerators from their
team. Overall, however, enumerator performance was high, requiring few replacements, while the unit
non‐response rate was very low at 0.16 percent among urban, rural, and IDP households, and 0.50 percent
among nomadic households.
3.3. Submissions quality standards
Each enumerator submission was subject to a set of minimum standards to ensure data quality.
Interviews were classified as valid or invalid based on the criteria listed in the following.
 Valid EAs. If the EA was not part of the final sample (i.e. it was replaced), the interview was classified
as invalid and thus excluded from the final data set.
 Valid EBs. If the EB was not part of the final sample (i.e. it was replaced), the interview was excluded.
 Duration. If the duration of the interview did not exceed the minimum threshold of 30 minutes, the
interview was excluded.
 Location. If the interview did not have GPS coordinates associated with it, the interview was
considered invalid. If the GPS coordinates fell outside buffer zone of a 50m+accuracy of GPS (based
on the minimum latitude‐longitude formula + 50m buffer) around the EA, the interview was excluded.
 Follow up visits. If the interview was not conducted in the first visit, the interview for the first visit
must be valid except for the minimum duration, and both records must contain matching GPS
positions (with a 10m + precision maximum distance), otherwise the interview completed in the
follow‐up visit was excluded.
 Replacement interview. If the interview was from a replaced household, the record of the original
household must be valid except for the minimum duration and the reason for no interview must also
be valid, otherwise the interview was excluded.
Beyond these criteria, the Survey Solutions CAPI platform allowed to ‘reject’ submissions on a case‐bycase
basis and send them back to enumerators to correct whenever headquarters found problems with a
submission.
17
4. Consumption aggregate
The main welfare measure used in this and other analyses using SHFS data is per‐capita consumption,
rather than income (Deaton and Zaidi, 2002). The SHFS collected data on realized consumption rather
than the total money spent on consumption items, as this measured actual realized welfare in a utilityconsistent
way (Ravallion, 1994). This section discusses the various adjustments made to the SHFS data
to construct the consumption aggregate using the Rapid Consumption Methodology. Section 6 discusses
the use of the international poverty line and various aggregate poverty measures to perform poverty
analysis on the SHFS data.
4.1. Cleaning of consumption data
Before deriving the consumption aggregate, the components of consumption –data on food consumption,
nonfood consumption, and durable assets– must undergo a cleaning process to correct outliers and other
mistakes (Deaton and Zaidi, 2002).
Food expenditure data are cleaned in a four‐step process. First, units for reported quantities of
consumption and purchase are corrected. Typical mistakes include recorded consumption of 100 kg of a
product (like salt) where the correct quantity is grams. These mistakes are corrected using generic rules
(Table A.4). Then, a conversion factor to kg for all units is introduced. For example, a small piece of bread
will likely have a different weight than a small piece of garlic. To avoid mistakes, enumerator trainings
focused on units and introduced a common understanding of what each unit means for each food item.
In addition, the conversion to kilograms was made explicit on the enumerators’ tablets (Table A.5). The
third step consisted of correcting issues with the exchange rate selected (Table A.6). Finally, outliers in
each component of consumption are detected using a set of cleaning rules to correct quantities and prices
(see Appendix Cleaning rules for food consumption data). The non‐food data set only contains values
without quantities and units. First, the same cleaning rules for currencies are applied (Table A.6), followed
by a set of specialized cleaning rules (see Appendix Cleaning rules for nonfood consumption data).
Likewise, for durables, the same cleaning rules for currencies are applied (Table A.6), and then a set of
durables‐specific cleaning rules (see Appendix Cleaning rules for durable assets).
4.2. Consumption aggregate using the Rapid Consumption Methodology
The nominal household consumption aggregate is the sum of four components, namely expenditures on
food items, expenditures on non‐food items, the value of the consumption flow from durable goods, and
housing (Deaton and Zaidi, 2002). Without a housing market functioning well enough to derive credible
estimates for the cost of housing, the SHFS consumption aggregate is based on the first three components:
food consumption, nonfood consumption, and consumption of durable assets.
Food and nonfood consumption in the Rapid Consumption Methodology
The SHFS used the Rapid Consumption Methodology to estimate the consumption aggregate. Pape and
Mistiaen (2018) provide a detailed and general exposition of the Rapid Consumption Methodology
including an ex‐post assessment of the methodology. The methodology is based on dividing food and
nonfood consumption items in one core and several optional modules. With each household assigned the
core module and one optional module, this methodology reduces the time spent on enumerating the
consumption modules. Deriving the consumption aggregate with this methodology is a two‐step process.
First, core and optional modules are constructed. Core items are selected based on their importance for
consumption. The remaining items are partitioned into optional modules. Optional modules are assigned
18
to groups of households. Second, after data collection, consumption of optional modules is imputed for
all households. Then, the resulting consumption aggregate is used to estimate poverty indicators.
Module construction
Food and non‐food consumption for household i are estimated by the sum of expenditures for the full list
of consumption items7
(2)

and

where
and
denote the food and non‐food consumption of item j in household i. As the estimation
for food and non‐food consumption follows the same principles, the upper indices f and n are neglected
in the remainder of this section. The list of items can be partitioned into M+1 modules each with mk items:
(3)

with

For each household, only the core module
and one additional optional module
∗are collected.
Item assignment to the core module was designed to maximize the core module’s share of total
consumption, so that a large share of consumption would be enumerated from each household.
Important items were identified by their average food share across households from wave 1 of the SHFS.8
This strategy relies on the fact that, in Somalia, a few dozen items capture the majority of consumption.
The core modules captured 94 percent of food consumption and 79 percent of nonfood consumption,
respectively (Table 2). Optional modules were constructed such that items are orthogonal within modules
and correlated between modules, using an iterative algorithm (Pape and Mistiaen, 2018).
Table 2: Item partitions and consumption shares in SHFS wave 2.
Food Items Non‐food Items
Number of
items Share Wave 2
Share Wave 2
Imputed
Number of
items Share Wave 2
Share Wave 2
Imputed
Core 38 94% 79% 29 79% 47%
Module 1 21 2% 8% 14 5% 14%
Module 2 18 2% 6% 15 6% 15%
Module 3 19 1% 5% 16 7% 18%
Module 4 18 1% 4% 15 5% 11%
Source: Authors’ calculations based on SHFS Wave 2.
7 The list of consumption items used in wave 2 of the SHFS is discussed in section 3.1.
8 Generally, previous consumption surveys in the same country or consumption shares of neighboring or similar
countries can be used to estimate food shares. In the worst case, a random assignment results in a larger standard
error but does not introduce a bias. The assignment of items to modules is very robust and, thus, even rough
estimates of consumption shares are sufficient to inform the assignment without requiring a baseline survey.
19
In fieldwork, a sufficient number of households must be assigned each optional module to obtain a reliable
total consumption estimate. In wave 2 of the SHFS, this was ensured by interviewing 12 households per
EA allowing for the ideal partition of three items per optional module.
Consumption estimation
Household consumption was then estimated using the core module, the assigned optional module, and
estimates for the remaining optional modules
(4)

∈∗
where ∗ ∶ 1, … , ∗ 1, ∗ 1, … , denotes the set of non‐assigned optional modules.
Consumption of non‐assigned optional modules was estimated using multiple imputation techniques
taking into account the variation absorbed in the residual term (Pape and Mistiaen, 2018). Multiple
imputation was implemented using multivariate normal regression based on an EM‐akin algorithm to
iteratively estimate model parameters and missing data.9 The standard errors capture the error
distribution of the multiple imputation process. The underlying model is a welfare model relating
consumption to key household characteristics thus explaining 71 percent of variation in food consumption
and 64 percent in nonfood consumption. The model parameters were household size, share of children
in household, share of seniors in household; household head gender, employment, and education;
dwelling type, dwelling drinking water access, dwelling floor, and dwelling ownership status; household
experience of hunger; receipt of remittances; population type (urban, rural, IDP, nomadic) and a regionpopulation
type interaction, as well as each household’s core consumption quartile.10 Pape and Mistiaen
(2018) demonstrate that the Rapid Consumption Methodology yields reliable estimates of poverty using
an ex‐post assessment with household budget data from Hargeisa and mimicking the Rapid Consumption
methodology by masking consumption of items that were not administered to households.11
Durable consumption flow
The consumption aggregate includes the consumption flow of durables calculated based on the user‐cost
approach. The consumption flow distributes the consumption value of the durable over multiple years.
9 Pape and Mistiaen (2018) test various other techniques for imputing total consumption, including OLS and tobit
module‐wise regression and multiple imputation chained equations, concluding that multivariate normal regression
is the preferred technique.
10 Negative imputed values are corrected by scaling all associated imputed values to an average of zero without
affecting the variance.
11 Pape and Mistiaen (2018) compare the imputation results with consumption estimates from the full consumption
modules of the 2013 Somaliland Household Survey. The authors present the performance of the estimation
techniques in terms of the relative bias (mean of the error distribution) and the relative standard error. The
methodology generally does not perform well at the household level (HH) but improves considerably already at the
enumeration area level (EA) where the average of 12 households is estimated. At the national aggregation level, the
Rapid Consumption methodology slightly over‐estimates consumption by 0.3 percent. Assessing the three standard
poverty measures (Foster et al., 1984) including poverty headcount (FGT0), poverty depth (FGT1) and poverty
severity (FGT2), the simulation results show that the Rapid Consumption methodology retrieves estimates within
1.5 percent of the reference measure (Figure A.5). Generally, the estimates are robust as suggested by the low
standard errors (Figure A.6). Simulations were also run for the complete data set from the Somaliland 2012
household budget survey producing comparable results.
20
The user‐cost principle defines the consumption flow of an item as the difference of selling the asset at
the beginning and the end of the year as this is the opportunity cost of the household for keeping the
item. The opportunity cost is the difference in the sales price and the forgone earnings on interest if the
asset is sold at the beginning of the year.
If the durable item is sold at the beginning of the year, the household would receive the market price pt
for the item and the interest on the revenue for one year. With it denoting the interest rate, the value of
the item thus is 1 . If the item is sold at the end of the year, the household will receive the
depreciated value of the item while considering inflation. With being the inflation rate during the year
t, the household would obtain 1 1 with the annual physical or technological depreciation
rate denoted as assumed constant over time.12 The difference between these two values is the cost that
the household is willing to pay for using the durable good for one year. Hence, the consumption flow is:
(5) y 1 1 1
By assuming that ≅ 0, the equation simplifies to
(6) y
where is the real market interest rate in period t. Therefore, the consumption flow of an item can be
estimated by the current market value , the current real interest rate , and the depreciation rate .
Assuming an average annual inflation rate , the depreciation rates can be estimated utilizing its
relationship to the market price13:
(7) 1 1
The equation can be solved for obtaining:
(8) 1

1
1
Based on this equation, item‐specific median depreciation rates are estimated assuming an inflation rate
of 0.5 percent, a nominal interest rate of 2.0 percent and, thus, a real interest rate of 1.5 percent (Table
A.8).
For all households owning a durable but did not report the current value of the durable, the item‐specific
median consumption flow is used. For households that own more than one of the durable, the
consumption flow of the newest item is added to the item‐specific median of the consumption flow times
the number of those items without counting the newest item.14
12 Assuming a constant depreciation rate is equivalent to assuming a “radioactive decay” of durable goods (Deaton
and Zaidi, 2002).
13 In particular, solves the equation Π 1
1 .
14 The SHFS wave 2 questionnaire provides information on a) the year of purchase and b) the purchasing price only
for the most recent durable owned by the household.
21
Deflators
Spatial price indices were calculated using a common food basket and spatial prices to make consumption
comparable across regions. The Laspeyres index is chosen as a deflator due to its moderate data
requirements. The deflator is calculated by analytical strata areas based on the price data collected in
wave 2 of the SHFS. The Laspeyres index (Table 3) reflects the item‐weighted relative price differences
across products. Item weights are estimated as household‐weighted average consumption share across
all households before imputation. Based on the democratic approach, consumption shares are calculated
at the household level. Core items use total household core consumption as reference while items from
optional modules use the total assigned optional module household consumption as reference. The
shares are aggregated at the national level (using household weights) and then calibrated by average
consumption per module to arrive at item‐weights summing to 1. The item‐weights are applied to the
relative differences of median item prices for each analytical stratum. Missing prices are replaced by the
item‐specific median over all households.
Table 3: Spatial Laspeyres index
Analytical strata Foo deflator
IDPs 0.856
Nomads 1.030
Banadir (Urban) 0.910
Nugaal (Urban) 1.058
Bari and Mudug (Urban) 0.976
Woqooyi Galbeed (Urban) 1.181
Awdal, Sanaag, Sool and Togdheer (Urban) 1.181
Hiraan, Middle Shabelle and Galgaduud (Urban) 1.119
Gedo, Lower and Middle Juba (Urban) 0.960
Bay, Bakool and Lower Shabelle (Urban) 0.931
Bari, Mudug and Nugaal (Rural) 0.960
Awdal, Togdheer and Woqooyi (Rural) 0.887
Hiraan, Middle Shabelle and Galgaduud (Rural) 0.925
Bay, Bakool and Lower Shabelle (Rural) 0.945 Source: Authors’ calculations.
To obtain the US$1.90 PPP (2011) poverty line and correct for price differences over time, a price index
was created –in the absence of a national CPI– using consumption shares from the survey and prices
collected by the Market Price Survey (MPS) and by the Food Security and Nutrition Analysis Unit, Somalia
(FSNAU).15 Inflation between 2011 and December 2017 was obtained from the growth in the price index,
which was estimated in two steps. First, the price index was calculated from 2011 to February 2016 using
data from Wave 1 of the SHFS and prices from FSNAU, and then from February 2016 to December 2017
with data from Wave 2 of the SHFS and prices from the MPS.16
In the first step, consumption shares of 109 food and 68 nonfood items were aggregated according to
their Classification of Individual Consumption by Purpose (COICOP) code, and then combined with
monthly prices from FSNAU for 51 products. As a result, 32 matched COICOP codes were used to calculate
15 FSNAU collects monthly prices of commodities in 50 markets across all regions. The MPS collects weekly prices of
a broad range of products and services as well as exchange rates from 14 key markets across all Somali regions.
16 The products and services in the MPS are a close match with the food and nonfood items that form part of the
consumption module of the Somali High Frequency household survey component. The price survey is implemented
using a stringent set of quality standards.
22
the price index between 2011 and February 2016. In the second step, consumption shares of 114 food
and 89 nonfood items were aggregated by COICOP code, in combination with weekly price series from
the MPS for 109 products. This resulted in 49 matched COICOP codes that were then used to estimate the
price index until December 2017.
4.3. Imputing consumption data in North‐East and Jubbaland regions
Despite methodological innovations, field team training, and a stringent security protocol (section 3.2),
some challenges with data collection persisted in certain geographic areas. These were mainly related to
human resource capacity constraints and remote monitoring to ensure the quality of the data. Specifically,
in the Jubbaland and rural North‐East regions,17 the information collected turned out to be only
representative of a very small, idiosyncratic part of the population or did not consistently meet the
survey’s high‐quality standards.
Jubbaland
The implementation of Wave 2 of the SHSF required some concessions to the local authorities in terms of
the recruitment of field teams. Some enumerators who performed sub‐optimally during training and the
pilot were recruited as agreed with local authorities in Jubbaland. Likewise, there were some constraints
to replace enumerators during the data collection if they were found to underperform. Based on internal
discussions and consultations with trusted team leaders, the SHFS team judged that this affected the
quality of the data collected in Jubbaland, particularly of the more demanding consumption modules,
compared to other regions. Furthermore, insecurity remained widespread in Jubbaland, mainly due to a
strong presence of Al‐Shabaab. The entire region of Middle Juba was excluded from wave 2 of the SHFS
due to security reasons. Likewise, large parts of Lower Juba, and to a lesser extent Gedo were also
excluded (see Table 1 for accessibility rates by pre‐war region).
In rural Jubbaland, field teams only collected data in areas that were relatively close to main cities (e.g.
within a 10‐km radius around Kismayo, Afmadow and Dhobley in Lower Juba). This was due to insecurity
and because many rural EAs considered in the sampling frame were found to be empty after reviewing
the satellite imagery. The EAs sampled for rural Jubbaland were peri‐urban areas that correspond to large
villages or small cities and thus the information was not representative of the rural population there. In
addition, data from teams surveying rural Jubbaland showed signs of inconsistency and relatively low
quality (highest percentage of invalid submissions compared to other urban and rural areas (Table 4);
largest number of flags in the cleaning process of the consumption modules (Table 6), and large
differences in the consumption of many food items relative to other rural areas (Table 7)). Interviews with
rural households from this region were therefore entirely excluded from the final data set, and poverty
estimated from satellite imagery and other geo‐spatial data (section 5).
In urban areas, data collection lasted longer than in any other area covered due to over‐sampling.
Insecurity also made it more difficult collecting interviews and thus required more time. Team leaders
reported that these issues contributed to fatigue on the part of enumerators, presumably impacting the
quality of the data collected in urban Jubbaland.
17 Jubbaland region consists of pre‐war regions Gedo, Middle Juba, and Lower Juba (Middle Juba was completely
inaccessible). North‐East region consists of pre‐war regions Nugaal, Bari, and Mudug (Table A.12).
23
Table 4: Percentage of valid submissions for
urban and rural areas
Region %
Mogadishu (Urban) 99.9
North‐east Urban 99.6
North‐east Rural 100.0
North‐west Urban 99.2
North‐west Rural 100.0
Central regions Urban 99.0
Central regions Rural 97.0
Jubbaland Urban 99.3
Jubbaland Rural 94.6
South West Urban 98.6
South West Rural 98.1
Table 5: Percentage of missing values for food
items in urban and rural areas
Region Percentage
Mogadishu (Urban) 54.8
North‐east Urban 58.4
North‐east Rural 61.2
North‐west Urban 58.2
North‐west Rural 61.2
Central regions Urban 57.9
Central regions Rural 58.3
Jubbaland Urban 49.8
Jubbaland Rural 49.1
South West Urban 57.5
South West Rural 56.5 Source: Authors’ calculations.
Source: Authors’ calculations.
Table 6: Number of flags in the cleaning of food
items for urban and rural areas
Region
Average number
per household
Mogadishu (Urban) 1.0
North‐east Urban 0.8
North‐east Rural 0.8
North‐west Urban 0.9
North‐west Rural 0.8
Central regions Urban 1.8
Central regions Rural 1.1
Jubbaland Urban 2.1
Jubbaland Rural 2.6
South West Urban 1.0
South West Rural 0.9
Table 7: Items consumed by 10% more/less
households in each region relative to the
urban/rural average
Region
Number of core
food items
Mogadishu (Urban) 5
North‐east Urban 6
North‐east Rural 20
North‐west Urban 7
North‐west Rural 17
Central regions Urban 1
Central regions Rural 13
Jubbaland Urban 20
Jubbaland Rural 20
South West Urban 10
South West Rural 8 Source: Authors’ calculations. Source: Authors’ calculations.
While the validity rate of submissions was in line with other regions (Table 4), the consumption data were
flagged as outliers more often than in other regions during the review and cleaning process (Table 6).
Further, the profile of food consumption for households in urban Jubbaland was different than in other
urban areas for 20 of 38 core food items (Table 7). These issues in the consumption modules led to
inconsistent poverty rates. Therefore, the information on the consumption modules (food, non‐food and
assets) was discarded and poverty estimated based on sociodemographic and other household
characteristics in a multiple imputation process routine.
Rural North‐East
The implementation of the survey also experienced some constraints in the recruitment of field teams in
the rural North‐East regions. The access of some areas in this region is possible only for team members
from certain clans. Thus, enumerators had to be selected and replaced based on this criterion. Some of
these candidates might not otherwise have been selected given their performance during training, the
pilot, and data collection. This was judged to have affected the quality especially of the consumption data
collected.
24
Moreover, the EAs sampled were spread across a vast territory and mostly in remote areas. They were far
from each other, and far from urban centers. NE teams who covered rural areas had to travel up to two
days to reach some EAs, longer than teams in any other region. Team leader reports from the field indicate
that these large distances and conditions created fatigue among enumerators. Further, direct monitoring
of field teams by supervisors was limited due to poor connectivity, and thus sending frequent and timely
feedback was more challenging that for other teams. As a result, the performance of teams did not
improve as in other regions.
Finally, the consumption profile of most core food items was different to other rural areas, including
nearby and ostensibly similar areas covered by other teams (Table 7). Hence, the consumption data (food,
non‐food and assets) were discarded, and poverty was estimated from a multiple imputation process.
Consumption imputation process
Consumption data in North‐East rural and Jubbaland urban were imputed in Stata with Multiple
Imputation (MI) techniques. The same multiple imputation process and model described to estimate the
consumption of non‐assigned optional modules from equation (4) were used to obtain the four
consumption components, and thus the total consumption expenditure for households in these regions.
The dependent variable of the model is total consumption expenditure per capita with data from North‐
East rural and Jubbaland urban set as missing and to be imputed.18 The independent variables were
chosen based on explanatory power with respect to household consumption: household size, share of
children in household, share of seniors in household; household head gender, employment, and
education; dwelling type, dwelling drinking water access, dwelling floor, and dwelling ownership;
household experience of hunger and receipt of remittances; population type (urban, rural, IDP, nomadic)
and a region‐population type interaction, as well as consumption quartiles. With an R‐Squared of 71
percent, this model had high explanatory power.
The model for imputing consumption had two caveats: first, each value or category of the right‐hand‐side
variables of the model must overlap with some non‐missing values of the dependent variable. Otherwise,
there is no basis for simulating the relationship between consumption and these explanatory variables.
This means that the region‐population type interaction variable must be modified, as North‐East rural and
Jubbaland urban are two categories of that variable without overlap with any non‐missing consumption
values. To do this, the North‐East rural category was combined with North‐East urban to form a general
North‐East category. Jubbaland urban was combined in the final specification with adjacent South‐West
urban (Table 8, column I). Various other specifications were tested in which Jubbaland urban was
combined with Central Regions urban, as an assessment of the sensitivity of the final estimates to this
choice (Table 8, column III and IV). Second, the model contains consumption quartiles as a key right‐handside
variable. Since the consumption data for North‐East rural and Jubbaland urban were inconsistent,
consumption quartiles were calculated for North‐East rural and Jubbaland urban separately, to include
this variable in the final specification. Other specifications excluding the quartile variable were assessed
as a sensitivity test as well (Table 8, column II and IV).
18 A logarithmic transformation is not feasible in this case due to its singularity at zero. As the core module was
constructed to capture maximum consumption shares, many optional modules – almost by definition – obtained
zero consumption especially among the poorer households, which have a less diversified diet.
25
Table 8: Multiple Imputation results.
(I) (II) (III) (IV)
Region Poverty rate
Mogadishu (Urban) 73.67% (69.45%, 77.9%) 72.25% (67.83%, 76.64%) 73.74% (69.54%, 77.94%) 72.28% (67.81%, 76.58%)
North‐east Urban 58.78% (43.17%, 74.38%) 56.93% (40.45%, 73.68%) 59.01% (43.54%, 74.49%) 57.21% (40.44%, 73.72%)
North‐east Rural 62.46% (62.1%, 62.81%) 64.97% (64.3%, 64.88%) 63.59% (52.36%, 75.21%) 65.01% (64.3%, 64.88%)
North‐west Urban 62.71% (51.81%, 73.62%) 61.5% (50.93%, 72.24%) 62.7% (51.83%, 73.68%) 61.48% (50.97%, 72.27%)
North‐west Rural 77.3% (67.07%, 87.53%) 75.29% (64.66%, 86.04%) 76.48% (65.52%, 87.4%) 75.41% (64.75%, 86.48%)
IDP Settlements 75.62% (62.35%, 88.88%) 74.55% (61.43%, 88.1%) 75.62% (62.31%, 88.86%) 74.45% (61.4%, 88.03%)
Central regions Urban 59.18% (47.46%, 70.9%) 58.21% (46.2%, 70.24%) 59.18% (47.42%, 70.85%) 58.24% (46.25%, 70.32%)
Central regions Rural 65.06% (27.44%, 102.7%) 64.77% (27.28%, 102.6%) 65.01% (27.41%, 102.5%) 64.81% (27.28%, 102.5%)
Jubbaland Urban 53.34% (42.4%, 64.29%) 59.33% (54.81%, 63.53%) 53.85% (42.51%, 64.31%) 48.81% (44.01%, 54.32%)
South West Urban 62.72% (43.1%, 82.35%) 60.8% (40.43%, 80.96%) 62.39% (42.62%, 82.22%) 60.91% (40.57%, 80.88%)
South West Rural 74.94% (61.43%, 88.44%) 73.61% (59.25%, 88%) 75% (61.52%, 88.45%) 73.53% (59.17%, 88.02%)
Nomadic population 71.61% (63.1%, 80.12%) 70.86% (62.27%, 79.54%) 71.71% (63.18%, 80.22%) 70.87% (62.28%, 79.53%)
Source: Authors’ calculations.
Note: (I) final model used to impute consumption and poverty; (II) sensitivity test without income quartiles in
imputation model; (III) sensitivity test with Jubbaland urban combined with Central regions instead of South‐West;
(IV) as (III) but without income quartiles. 95% confidence interval in parentheses
The results from the imputation process are stable and robust considering these different specifications.
The imputation process and these results were judged the best alternative to overcome the issues
experienced in data collection.
5. Imputing poverty in inaccessible areas using geo‐spatial data
Prevalent insecurity and conflict meant that parts of Somalia remained inaccessible for the SHFS field
teams (Table 1). In the 10 least accessible urban and rural strata, less than 50 percent of the population
could safely be reached.19 The survey poverty estimates in these regions are therefore insufficiently
representative of the regions’ entire urban and rural populations. Hence, poverty in each region was
predicted making use of correlations between geo‐spatial information and survey estimates. The resulting
poverty predictions are supplemental to survey estimates and serve as a proof‐of‐concept for using geospatial
information alongside on‐the‐ground data collection. This section describes selection of geospatial
variables and the model used to impute poverty.
5.1. Selection of variables for poverty predictions
Spatial variables expected to predict poverty well were drawn from three types of sources. First, a customderived
global database of over 300 spatial covariates from the WorldPop research group at the University
of Southampton (see Stevens et al., 2015).20 Second, spatial variables were computed from geo‐tagged
data from publicly available sources such as ACLED conflict data or FEWSNET food security data, and
OpenStreetMap. Third, population and population type data drawn from a novel population density map
using recent data from OpenSteetMap, BMGF / Digital Globe spatial data, UNFPA survey and SHFS data.
19 Of these 10 strata, 4 were urban and 6 were rural. The survey of IDP and nomadic populations was not subject to
similar accessibility problems, so that survey results are considered representative for these populations.
20 WorldPop “Global high resolution population denominators”, project funded by the Bill & Melinda Gates
Foundation (OPP1134076).
26
From these sources, 15 variables were selected based on their correlation with survey poverty estimates
at the EA‐level. These contained information on the type of land cover (distance to bare land cover,
distance to cultivated areas),21 climate (temperature, precipitation, distance to drought‐affected areas),
population characteristics (population density, distance to urban areas), infrastructure (distance to major
roads, medical sites, schools, water sources, and waterways), conflict and insecurity (distance to conflict
incidents, distance to insecure areas), and food security (distance to food insecure areas). A detailed list
of the selected variables, their sources, preparation for analysis, illustration (Table A.8), summary statistics
(Table A.9), and linear correlations with survey poverty estimates (Table A.11) are available in the
Appendix.
5.2. Model selection
The final model to predict poverty was selected in two steps. First, a range of model types was compared
based on a five‐fold cross validation scheme.22 The data were randomly partitioned in five folds, four of
which made up the training set and one served as the validation set, ensuring that each model was trained
and validated on identical data. Models’ prediction success in the validation set determined which models
were selected, with R‐squared and Root mean squared error (RSME) as goodness‐of‐fit measures. The
models were fitted separately for each population type.23 The survey poverty estimates aggregated at the
EA‐level served as the response variable.24 Linear models yielded the best results. Second, the selection
of covariates, from the 15 spatial variables presented in Table A.8, was refined using stepwise regression
to minimize the RMSE of the linear models and maximize their predictive power. In this process, a
sequence of linear combinations of up to 15 covariates, as well as covariate interactions, was iteratively
fitted to the response variable with different starting points and criterion for selecting the covariates,
using the full data set of survey poverty estimates at the EA‐level.25 The final model for each population
type was the one with the lowest AIC and RSME value.26 Furthermore, the residuals did not present any
patterns and, therefore, were treated as random.
Final models for predicting urban and rural poverty
The final model for predicting poverty in urban areas contained 12 covariates and various covariate
interactions (Table 9; Figure A.2). Most variables individually, and all variables collectively, are statistically
significant in explaining variation in poverty. However, the model’s overall explanatory power is limited,
with an adjusted R‐squared of 52 percent. To check for potential issues with over‐fitting, 10 percent of
21 Variables produced by D. Kerr, H. Chamberlain and M. Bondarenko (WorldPop) in the framework of the
WorldPop “Global high resolution population denominators.”
22 Several models in each of the following categories were tested: linear models, random forest models, Support
Vector Machine models, and Gaussian Process Regressions.
23 IDP and nomadic households did not suffer from accessibility problems.
24 EA‐level poverty estimates are preferable to household‐level estimates as EAs cover larger areas and contain fewer
binary values. Thus, the model was trained and tested at the EA‐level and within EA variability was not considered.
25 The MATLAB function ‘fitlm’ was used to obtain a first model for EA‐level poverty. The MATLAB ‘step’ function,
which implements stepwise regression, was then used to select model terms, including interactions of terms. See
https://uk.mathworks.com/help/stats/fitlm.html and https://uk.mathworks.com/help/stats/linearmodel.step.html
for the MATLAB documentation of these functions. Goodall (1993) provides the basis for the ‘fitlm’ fitting algorithm
and Draper and Smith (2014) give an overview of stepwise regression on which ‘step’ is based.
26 Minimizing the Akaike information criterion (AIC) of a linear model is equivalent to minimizing the crossvalidation
error. See Shao (1997) and Stone (1977).
27
the sample was randomly excluded and the model from Table 10 estimated. The process was repeated
1,000 times and Figure A.4 shows the results for the in‐sample and out of sample R2 of this validation.
Table 9: Final model to predict urban poverty.
Coefficients Coefficient estimate Standard error p‐value
(Intercept) ‐1.946 0.355 0.000
Distance to bare areas 0.165 0.028 0.000
Distance to cultivated areas 0.000 0.008 0.969
Distance to dry areas 0.001 0.000 0.017
Distance to major roads 0.084 0.026 0.002
Distance to medical sites 0.062 0.024 0.011
Distance to schools 0.083 0.028 0.003
Distance to unsafe areas 0.002 0.001 0.001
Distance to urban areas ‐0.057 0.021 0.009
Distance to water sources 0.001 0.001 0.147
Distance to waterways 0.001 0.001 0.437
Population density 0.000 0.000 0.001
Temperature 0.084 0.013 0.000
Distance to bare areas x Distance to waterways 0.000 0.000 0.000
Distance to bare areas x Temperature ‐0.006 0.001 0.000
Distance to cultivated areas x Distance to waterways ‐0.001 0.000 0.012
Distance to dry areas x Distance to schools 0.000 0.000 0.001
Distance to major roads x Distance to schools ‐0.010 0.003 0.001
Distance to major roads x Distance to unsafe areas ‐0.003 0.001 0.000
Distance to major roads x Distance to urban areas ‐0.090 0.024 0.000
Distance to medical sites x Distance to water sources ‐0.001 0.000 0.015
Distance to schools x Temperature ‐0.002 0.001 0.025
Distance to urban areas x Distance to water sources 0.002 0.000 0.000
Model statistics
Unit of observation Enumeration areas
Observations 252
Degrees of freedom 229
R‐squared 0.56
Adjusted R‐squared 0.518
Root mean squared error 19.8
F‐Statistic 13.3
Source: Flowminder / WorldPop.
The model’s relatively low predictive power is likely because the explanatory variables do not vary at a
high enough spatial frequency relative to urban poverty estimates, which can vary significantly across a
small space. Furthermore, distance explanatory variables could result in relatively smooth predictions
across space and not accurately capture small geographical clusters of low/high consumption.27 For
example, in urban settings, poverty levels may be quite different in two EAs which are only several
hundred meters apart. In contrast, the same two EAs will have very similar levels of precipitation or,
27 The same issue can arise with non‐distance variables computed within a buffer, such as precipitation,
temperature and conflict density.
28
depending on spatial resolution, may indeed be covered by the same precipitation data point. Predictors
such as the density of buildings or building patterns would likely improve the model.
Further, two different sets of night‐time lights data were used to improve the predictive power of the
urban model, but these turned out to be poorly correlated with survey poverty estimates and did not
improve the urban model’s predictive power.28 This failure to improve the model is likely due to the nighttime
lights data’s coarse resolution of 1km and 500m, respectively.
In rural areas, EAs are highly dispersed and poverty levels somewhat more spatially homogenous. Hence,
the rural model was more successful at explaining variation in poverty in rural areas, with an adjusted Rsquared
of 94 percent (Table 10).
The uncertainty from using spatial covariates as explanatory variables was not considered in the
estimation of standard errors. However, the data points used in the model were randomly selected to
ensure they were taken from places far from each other. The resulting weighted average coefficient of
variation (CV) from estimates for urban districts is 0.19 and 0.73 for rural districts. Moreover, EAs were
randomly selected for the survey with the multi‐stage stratified process described above, which combined
with a random selection of data points to estimate the model, aims to derive a sample of EAs with different
values within the range of each explanatory variable, similar to the range from the overall EA population.
Table 10: Final model to predict rural poverty.
Coefficients Coefficient estimate Standard error p‐value
(Intercept) 2.075 0.320 0.000
Conflicts density 0.000 0.000 0.003
Distance to cultivated areas 0.040 0.008 0.000
Distance to food insecure areas 0.020 0.008 0.018
Distance to major roads ‐0.019 0.005 0.001
Distance to medical sites ‐0.026 0.004 0.000
Distance to schools 0.034 0.005 0.000
Distance to unsafe areas ‐0.009 0.004 0.027
Distance to urban areas 0.011 0.002 0.000
Distance to water sources 0.007 0.002 0.000
Distance to waterways 0.000 0.001 0.830
Precipitations 0.001 0.000 0.001
Temperature ‐0.089 0.012 0.000
Conflicts density x Distance to cultivated areas 0.000 0.000 0.005
Distance to cultivated areas x Distance to major roads 0.001 0.000 0.002
Distance to cultivated areas x Distance to medical sites ‐0.001 0.000 0.001
Distance to cultivated areas x Distance to schools ‐0.002 0.000 0.000
Distance to food insecure areas x Distance to schools 0.000 0.000 0.002
Distance to food insecure areasx Distance to urban areas 0.000 0.000 0.001
Distance to food insecure areas x Distance to water sources 0.000 0.000 0.000
Distance to food insecure areas x Temperature ‐0.001 0.000 0.000
Distance to medical sites x Distance to urban areas 0.001 0.000 0.000
Distance to unsafe areas x Distance to water sources 0.000 0.000 0.025
Distance to urban areas x Distance to water sources 0.000 0.000 0.000
28 The two data sets are from ‘Visible Infrared Imaging Radiometer Suite’ (VIIRS) and Defense Meteorological
Satellite Program (DMSP).
29
Distance to urban areas x Distance to waterways 0.000 0.000 0.027
Model statistics
Unit of observation Enumeration areas
Observations 92
Degrees of freedom 67
R‐squared 0.953
Adjusted R‐squared 0.937
Root mean squared error 11.2
F‐Statistic 56.9
Source: Flowminder / WorldPop.
Both the urban and the rural model were used to predict poverty at the 100m‐by‐100m pixel‐level for all
urban and inhabited rural areas. In order to derive imputed poverty estimates at the pre‐war region and
district levels, pixels were aggregated using as population weights an updated version of the WorldPop
population layer.29
6. Poverty in Somalia
Poverty is a complex phenomenon that refers to the deprivation of a person, household, or community in
multiple dimensions (Deaton and Zaidi, 2002). In general, it considers whether individuals or households
have enough resources to meet their needs. Identifying the poor population or those living below a
minimum threshold is a first crucial step for evidence‐based planning aimed at alleviating poverty in any
country. Profiling the poor and vulnerable is crucial to inform policies, design targeted interventions, as
well as to monitor and evaluate the evolution of living standards and poverty reduction efforts (Baker,
2000). This section presents an overview of quantitative measures used to assess poverty and inequality
in Somalia using SHFS wave 2 data. The analysis focuses on the monetary dimensions of poverty. The
World Bank’s forthcoming Somali Poverty Assessment, and therein especially the first chapter, provides a
more detailed analysis of poverty and deprivation, including non‐monetary dimensions of deprivation.
6.1. Measuring poverty
Three components are required for poverty analysis. First, a measure of welfare. Second, a poverty line
that defines a level of welfare at which individuals are either considered poor or not poor. Third, an
aggregate poverty measure (Coudouel et al., 2002; Haughton and Khandker, 2009; Ravallion, 2008). The
measure of welfare used in this analysis, per‐capita consumption, is discussed in detail in Section 4.
Poverty line
There are two types of poverty lines: relative to the overall distribution of consumption in a country, or
anchored in an absolute level of what a household should consume to meet basic needs (Beegle, at al.,
2016). Many countries define a national poverty line based on the cost of essential food items or a
minimum calorie intake in that country, along with an allowance for non‐food products. While a national
poverty line allows for a precise measure of poverty according to national standards and circumstances,
it is not comparable with other countries. Thus, absolute poverty lines are preferred to measure poverty
across countries.
This analysis uses the international poverty line which was introduced in the 1990 World Development
Report with the aim of measuring poverty consistently across countries (Ravallion et al., 2009). To be
29 The poverty estimates were obtained using an updated WorldPop population density map of Somalia with the
latest data from wave 2 of the SHFS and DigitalGlobe.
30
representative of poverty in the poorest countries, it was computed using data from national poverty lines
of 33 of the poorest countries. The international poverty line is expressed in terms of purchasing power
parity (PPP) rather than traditional currency exchange rates to compare both poverty and GDP across
countries (Beegle et al., 2016).30 The value of the poverty line has been revised through the years and
adjusted to reflect welfare conditions of low‐income countries. In 2008 this international line was
estimated at $1.25 per capita per day at 2005 prices. In 2015 the line was updated to its current level at a
daily value of US$ 1.90 (2011 PPP) per person (World Bank, 2016b).
Poverty and inequality measures
The poverty measure is primarily based on the three standard poverty measures following Foster, Greer,
and Thorbecke (1984). These measures are derived from the following general function:
(9)
1

Here denotes the consumption of individual , the total population, the poor population and the
poverty line. The poverty headcount ratio is obtained when the parameter takes the value of 0, the
poverty gap and severity when this parameter is set to 1 and 2 respectively. The poverty headcount ratio
or poverty incidence is the most common poverty measure. It is the share of population in a given region
that is poor by virtue of having a total consumption lower than the poverty line. With 0, the poverty
headcount ratio can be expressed as the sum of poor individuals () over the total population (, such
that
(10) 0

The poverty gap, obtained when takes the value of 1, measures how far households or individuals are
from overcoming poverty, by measuring the distance poor households are from the poverty line. It
captures the difference between poor households’ current consumption and the poverty line as a
proportion of the poverty line. It can be interpreted as the minimum amount of resources that would have
to be transferred to the poor, under a perfect targeting scheme, to eradicate poverty (Deaton, 2006). This
measure is obtained by adding up all the shortfalls of the poor relative to the poverty line and dividing the
total by the population:
(11) 1
1

The poverty severity index measures the level of inequality among the poor. This measure is estimated as
the square of the poverty gap. It attributes a larger weight to the poorest among the poor, with the
formula given by:
(12) 2
1

30 The poverty line was derived considering the regression‐based PPP estimate for Somalia, which corresponds to a
private consumption conversion factor of US$1 PPP (2011) worth 10,731 SSh.
31
In the context of monetary poverty, equality can be defined as an equal distribution of consumption across
the population, with inequality being the departure from that equal distribution. Measures of inequality
are thus defined over the entire population, aiming instead to capture the full consumption distribution
without depending on the mean of the consumption distribution. It is important to note that measuring
inequality with consumption, instead of income, tends to underestimate inequality in the population as
consumption‐based measures do not consider savings or wealth (Beegle et al., 2016).
The Gini index or coefficient is the primary measure of inequality presented in this analysis. It ranges
between 0 and 1, such that a coefficient equal to 0 indicates perfect equality and equal to 1 complete
inequality. The Gini index is graphically represented by the Lorenz curve, a visual representation of the
distribution of consumption across the population. It plots the cumulative population distribution by
consumption percentile against the cumulative consumption distribution. The Gini index is the area
between perfect equality, as represented by the 45‐degree line, and the Lorenz curve observed from the
data, relative to the maximum area that would be attained given perfect inequality (Figure A.7). Formally,
(13) 1

where denotes the cumulative proportion of the total country‐wide consumption expenditure for the
ith person and the cumulative proportion of the total population for the ith person. An alternative
measure of inequality presented below is the Theil index. It is part of a larger family of measures referred
to as the general entropy class (Coudouel et al., 2002), with the general formula given by:
(14)
1
1

1

1

where denotes the total consumption for individual i, the mean expenditure per capita and N is the
total population. The parameter regulates the emphasis placed on higher or lower incomes. As with the
Gini index, higher values of the Theil index represent higher levels of inequality, but unlike the Gini
coefficient, this measure is not bounded between 0 and 1. Moreover, the Theil index is sensitive to
inequality among the poor, and has the advantage of being additive across different subgroups in the
country, allowing to decompose inequality into how much of it is explained by differences within groups
and how much by differences between groups.
6.2. Results
As data collection in wave 2 of the SHFS was restricted to accessible areas, survey poverty headcount
estimates are representative of only of the population living in these areas (Table 1; Figure 1). The SHFS
filled this critical gap by imputing poverty based on data extracted from satellite images for inaccessible
areas. Section 5 describes the imputation methodology in detail. Survey and satellite imputation estimates
for all population types were combined to compute a poverty headcount rate representing the entire
32
Somali population (Table A.13; Figure 7).31 Overall, 77 percent of the Somali population lived below the
poverty line in December 2017. This poverty incidence was 26 percentage points higher than the
unweighted average of low‐income countries in Sub‐Saharan Africa (51 percent) in 2017. The country has
the third‐highest poverty rate in the region, after Burundi and South Sudan (Figure 5).32 The high poverty
incidence of Somalia is in line with its low levels of Gross Domestic Production (GDP) per capita, which
was estimated at US$450 in 2017 (Figure 5).33
Figure 5: Cross‐country comparison of poverty
and GDP
Figure 6: Poverty incidence
Source: Authors’ calculation and World Bank
Open Data.
Source: Authors’ calculations.
Poverty is somewhat heterogeneous between different population types and regions. Urban areas have
a lower poverty headcount rate (60 percent), than the rest of the Somali population (Figure 6; p<0.01 vs.
Mogadishu, p<0.05 vs. IDPs in settlements and nomads, and p<0.10 vs. rural areas).34 This comparison
excludes the capital, Mogadishu, whose residents are poorer than in other urban areas (between 72 and
76 percent). This higher poverty rate in Mogadishu compared to other urban areas is likely the result of a
31 To derive a nation‐wide poverty rate, survey and satellite estimates were combined in the following way. For each
pre‐war region and population type, the satellite prediction was considered if the accessibility rate in wave 2 was 90
percent or less, and the survey estimate was used if accessibility exceeded this threshold.
32 The countries used for regional comparison are all the African low‐income countries as defined by the World Bank:
Benin, Burkina Faso, Burundi, Central African Republic, Chad, Comoros, Democratic Republic of Congo, Eritrea,
Ethiopia, Guinea, Guinea‐Bissau, Liberia, Madagascar, Malawi, Mali, Mozambique, Niger, Rwanda, Senegal, Sierra
Leone, South Sudan, Tanzania, Togo, Uganda, and Zimbabwe. For each country, we include the most recent available
year for each indicator.
33 For international comparisons, the poverty rate for Somalia was derived from satellite estimates. In the rest of the
section, the figures refer to survey estimates unless explicitly noted.
34 Urban areas usually benefit from agglomeration effects that result in more economic opportunities and access to
services, relative to rural areas (Lall et al., 2017).
ETH
UGA
TZA
RWA
MWI
SOM
SSD
15
25
35
45
55
65
75
85
95
0 500 1,000 1,500 2,000 2,500 3,000
Poverty incidence (% of population)
GDP per capita (US$ PPP)
0
20
40
60
80
100
Mogadishu Other urban Rural IDPs in
settlements
Nomads
% of population
Overall average (survey estimates)
33
larger concentration of the displaced population and the challenges associated with the displacement
crisis, which the 2016/17 drought recently exacerbated.35
Poverty is also heterogeneous across space. Based on estimates from satellite imputation, the highest
levels of poverty are clustered in south‐western Somalia, and several districts in northern Somalia (Figure
7).
Figure 7: Map of poverty incidence at the district‐level based on satellite imputation36
35 Banadir/Mogadishu concentrates 41 percent of IDPs in settlements and 28 percent of the overall displaced
population according to the second wave of the SHFS. The share is similar (22 percent) for the overall displaced
population with data from Protection & Return Monitoring Network of the United Nations High Commissioner for
Refugees (UNHCR).
36 The boundaries on the map show approximate borders of Somali pre‐war regions and do not necessarily reflect
official borders, nor imply the expression of any opinion on the part of the World Bank concerning the status of any
territory or the delimitation of its boundaries.
34
Source: Flowminder / WorldPop.
Note: The poverty incidence of each region does not include IDPs in settlements.
Figure 8: Poverty gap Figure 9: Poverty severity
Source: Authors’ calculations.
Source: Authors’ calculations.
The average poverty gap in Somalia was estimated at 29 percent (Figure 8), implying that the average
consumption level of a poor Somali is about 71 percent of the international poverty line. Poverty was
deeper in rural areas and IDP settlements (34 percent for both), compared to Mogadishu (27 percent,
p<0.1) and other urban areas (24 percent, p<0.05). A large share of Somalis living in poverty, together
with a considerable shortfall in their consumption expenditure relative to the poverty line means that a
substantial boost in consumption would be necessary to overcome poverty. A transfer of around US$ 1.64
billion per year would lift the poor population out of poverty, assuming a perfect targeting scheme and
ignoring administrative and logistical costs.37 In line with these results, the average poverty severity index
is 15 percent pointing to inequalities among the poor. These inequalities were concentrated in rural areas
and IDP settlements (Figure 9).
Consumption was relatively homogenous due to the high levels of monetary deprivation shared by most
households. Hence, inequality was relatively low with a Gini index of 34 percent in 2017. Other lowincome
countries in Sub‐Saharan Africa with similar levels of poverty tend to have higher levels of
inequality. For example, Malawi and South Sudan which have a poverty incidence of 69 and 82 percent
respectively, have around a 12 percentage points higher Gini index than Somalia (Figure 10). The Gini
index is 41 percent in rural areas, 34 percent in other urban areas and 26 percent in Mogadishu (Figure
11). Donor support concentrated in urban areas due to insecurity and accessibility constraints may help
in leveling the consumption of the urban population, leading to lower levels of inequality.
Overall inequality stems largely from differences within regions and population groups, rather than from
differences between them. The Theil index indicates that between 98 and 99 percent of total inequality
is the result of inequality within groups (Table 11). Differences between households from within the same
37 Corresponds to an annual value for all the regions, including areas not covered in wave 2 of the SHFS. For these,
the same poverty incidence and gap was assumed as in regions covered by the survey.
0
10
20
30
40
50
Mogadishu Other
urban
Rural IDPs in
settlements
Nomads
% of poverty line
Overall average
0
5
10
15
20
25
30
Mogadishu Other
urban
Rural IDPs in
settlements
Nomads
Poverty severity index
Overall average
35
region or population group (Mogadishu, other urban, IDPs in settlements and nomads) largely explain
inequality in consumption.
Figure 10: Cross‐country comparison of poverty
and inequality
Figure 11: Inequality
Source: Authors’ calculations.
Source: Authors’ calculations.
Table 11: Inequality decomposition
Theil GE(1) inequality index
Decomposition By population type By region
Between group 0.002 0.005
Within group 0.208 0.205
Total 0.210 0.210
Source: Authors’ calculations.
The consumption distributions of the different population groups are relatively similar. The largest
differences between rural and urban areas, as well as between IDPs in settlements and nomads, are found
below the poverty line (Figure 12). A considerable share of 10 percent of the non‐poor population is
clustered within 20 percent of the poverty line. This population is susceptible to fall into poverty in case
of an unexpected decrease in their consumption levels.
Figure 12: Consumption distribution
ETH
UGA
TZA
RWA
MWI
SOM
SSD
30
35
40
45
50
55
60
15 35 55 75
GINI Index (0‐100)
Poverty incidence (% of population)
0
10
20
30
40
50
Mogadishu Other
urban
Rural IDPs in
settlements
Nomads
GINI Index (0‐100)
Overall average
36
Source: Authors’ calculations.
Source: Authors’ calculations.
7. Conclusions
The lack of data in Somalia poses a risk to evidence‐based interventions aimed at alleviating poverty and
inequality. To mitigate this risk, the World Bank implemented Wave 2 of the Somali High Frequency Survey
to better understand the welfare conditions of the population and to estimate the incidence of poverty.
An analysis of the data set has been published as the Somali Poverty and Vulnerability Assessment (World
Bank, 2018b).
This paper contributes to several themes in the literature on poverty measurement and data collection in
the context of conflict and fragility, involving hard‐to‐survey populations. It outlines how challenges
associated to the context of insecurity and lack of statistical infrastructure in Somalia were overcome
through four methodological and technological adaptations: i) building a probability‐based population
sampling frame; ii) minimizing the time spent in the field using the Rapid Consumption Methodology; iii)
estimating poverty in completely inaccessible areas with correlates derived from satellite imagery and
other geo‐spatial data; and iv) employing a special sampling strategy for the nomadic population.
Further improvements in terms of human resource capacity should be considered to minimize disruptions
to the quality of the data, besides field team training and stringent security protocols. Also, future
applications should consider refining the model to predict poverty from satellite imagery by incorporating
predictors with higher spatial frequencies, as well as data on building footprints, which are likely to
improve the estimates. Other alternatives are thresholding some of the distance variables or applying a
sigmoid transformation to capture variations in small areas. Furthermore, the accuracy of satellite‐based
imputations should be assessed based on a reference data set, ideally in a more stable environment.
Poverty line (US$ 1.9 PPP)
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8
% of population
Daily consumption expenditure per capita (US$)
Urban
Rural
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8
% of population
Daily consumption expenditure per capita (US$)
IDPs in settlements
Nomads
Poverty line (US$ 1.9 PPP)
37
References
Aminipouri, M., Sliuzas, R., Kuffer, M., 2009. Object‐oriented analysis of very high resolution
orthophotos for estimating the population of slum areas, case of Dar‐Es‐Salaam, Tanzania, in:
Proc. ISPRS XXXVIII Conf. pp. 1–6.
Baker, J.L., 2000. Evaluating the impact of development projects on poverty: A handbook for
practitioners. The World Bank.
Barry, M., Rüther, H., 2005. Data collection techniques for informal settlement upgrades in Cape Town,
South Africa. Urisa Journal 17, 43–52.
Beegle, K., Christiaensen, L., Dabalen, A., Gaddis, I., 2016. Poverty in a rising Africa. World Bank
Publications.
Beegle, K., De Weerdt, J., Friedman, J., Gibson, J., 2012. Methods of household consumption
measurement through surveys: Experimental results from Tanzania. Journal of Development
Economics 98, 3–18.
Caeyers, B., Chalmers, N., De Weerdt, J., 2012. Improving consumption measurement and other survey
data through CAPI: Evidence from a randomized experiment. Journal of Development Economics
98, 19–33.
Coudouel, A., Hentschel, J.S., Wodon, Q.T., 2002. Poverty measurement and analysis. A Sourcebook for
poverty reduction strategies 1, 27–74.
Deaton, A., 2006. Measuring poverty. Understanding poverty 3–15.
Deaton, A., Grosh, M., 2000. Consumption in designing household survey questionnaires for developing
countries. Designing Household Survey Questionnaires for Developing Countries: Lessons from
from 15.
Deaton, A., Zaidi, S., 2002. Guidelines for constructing consumption aggregates for welfare analysis.
World Bank Publications.
Demobynes, G., Sofia, K.T., 2016. What Has Driven the Delince of Infant Mortality in Kenya. Economics &
Human Biology 17–32. https://doi.org/10.1016/j.ehb.2015.11.004
Dillon, B., 2012. Using mobile phones to collect panel data in developing countries. Journal of
international development 24, 518–527.
Draper, N.R., Smith, H., 2014. Applied regression analysis. John Wiley & Sons.
Engstrom, R., Hersh, J., Newhouse, D., 2017. Poverty from space: using high‐resolution satellite imagery
for estimating economic well‐being. The World Bank.
Firchow, P., Mac Ginty, R., 2016. Including Hard‐to‐Access Populations Using Mobile Phone Surveys and
Participatory Indicators. Sociological Methods & Research 0049124117729702.
Foster, J., Greer, J., Thorbecke, E., 1984. A Class of Decomposable Poverty Measures. Econometrica:
journal of the econometric society 761–766.
Fujii, T., Van der Weide, R., 2013. Cost‐effective estimation of the population mean using prediction
estimators. The World Bank.
Goodall, C.R., 1993. 13 Computation using the QR decomposition.
Haughton, J., Khandker, S.R., 2009. Handbook on poverty+ inequality. World Bank Publications.
38
Henderson, J.V., Storeygard, A., Weil, D.N., 2012. Measuring economic growth from outer space.
American economic review 102, 994–1028.
Himelein, K., Eckman, S., Murray, S., 2014. Sampling nomads: a new technique for remote, hard‐toreach,
and mobile populations. Journal of official statistics 30, 191–213.
Himelein, K., Eckman, S., Murray, S., Bauer, J., 2016. Second‐stage sampling for conflict areas: methods
and implications.
Hoogeveen, J., Nguyen, N.T.V., 2017. Statistics Reform in Africa: Aligning Incentives with Results. The
Journal of Development Studies 1–18.
Jean, N., Burke, M., Xie, M., Davis, W.M., Lobell, D.B., Ermon, S., 2016. Combining satellite imagery and
machine learning to predict poverty. Science 353, 790–794.
Kalsbeek, W.D., 1986. Nomad sampling: an analytic study of alternative design strategies, in:
Proceedings of the Section on Survey Research Methods.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural
networks, in: Advances in Neural Information Processing Systems. pp. 1097–1105.
Linard, C., Alegana, V.A., Noor, A.M., Snow, R.W., Tatem, A.J., 2010. A high resolution spatial population
database of Somalia for disease risk mapping. International Journal of Health Geographics 9, 45.
https://doi.org/10.1186/1476‐072X‐9‐45
Mellander, C., Lobo, J., Stolarick, K., Matheson, Z., 2015. Night‐Time Light Data: A Good Proxy Measure
for Economic Activity? PLOS ONE 10, e0139779. https://doi.org/10.1371/journal.pone.0139779
Minasny, B., McBratney, A.B., Walvoort, D.J., 2007. The variance quadtree algorithm: Use for spatial
sampling design. Computers & Geosciences 33, 383–392.
Muñoz, J., Langeraar, E., 2013. A census independent sampling strategy for a household survey in
Myanmar. Available at: bit. ly/TU94rr.
Neyman, J., 1934. On the two different aspects of the representative method: the method of stratified
sampling and the method of purposive selection. Journal of the Royal Statistical Society 97, 558–
625.
Olson Lanjouw, J., Lanjouw, P., 2001. How to compare apples and oranges: Poverty measurement based
on different definitions of consumption. Review of Income and Wealth 47, 25–42.
Oseni, G., Durazo, J., McGee, K., 2017. The Use of Non‐Standard Units for the Collection of Food
Quantity.
Pape, U.J., Mistiaen, J.A., 2018. Household expenditure and poverty measures in 60 minutes: a new
approach with results from Mogadishu.
Pinkovskiy, M., Sala‐i‐Martin, X., 2016. Lights, Camera… Income! Illuminating the national accountshousehold
surveys debate. The Quarterly Journal of Economics 131, 579–631.
Ravallion, M., 2008. Poverty lines. The New Palgrave Dictionary of Economics 2.
Ravallion, M., 1994. Poverty comparisons, Fundamentals of pure and applied economics. Harwood
Academic: Chur.
Ravallion, M., Chen, S., Sangraula, P., 2009. Dollar a day revisited. The World Bank Economic Review 23,
163–184.
39
Shao, J. “An Asymptotic Theory for Linear Model Selection.” Statistica Sinica. Vol. 7, 1997, pp. 221‐264.
Smith, L.C., Dupriez, O., Troubat, N., 2014. Assessment of the reliability and relevance of the food data
collected in national household consumption and expenditure surveys. International Household
Survey Network.
Soumare, B., Tempia, S., Cagnolati, V., Mohamoud, A., Van Huylenbroeck, G., Berkvens, D., 2007.
Screening for Rift Valley fever infection in northern Somalia: a GIS based survey method to
overcome the lack of sampling frame. Veterinary Microbiology 121, 249–256.
Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J., 2015. Disaggregating Census Data for Population
Mapping Using Random Forests with Remotely‐Sensed and Ancillary Data. PLOS ONE 10,
e0107042. https://doi.org/10.1371/journal.pone.0107042
Stone, M. “An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike’s Criterion.”
Journal of the Royal Statistical Society. Series B, Vol. 39, 1977, pp. 44‐47.
Turkstra, J., Raithelhuber, M., 2004. Urban slum monitoring, in: 24th Annual ESRI International User
Conference, 9th‐13th August. Citeseer.
UNFPA, 2014. Population Estimation Survey 2014 for the 18 pre‐war regions of Somalia.
UNHCR, 2018. Camp Coordination and Camp Management Cluster. February 2018. United Nations High
Commissioner for Refugees.
United Nations Statistical Division, 2005. Household surveys in developing and transition countries.
United Nations Publications.
Wardrop, N.A., Jochem, W.C., Bird, T.J., Chamberlain, H.R., Clarke, D., Kerr, D., Bengtsson, L., Juran, S.,
Seaman, V., Tatem, A.J., 2018. Spatially disaggregated population estimates in the absence of
national population and housing census data. PNAS 201715305.
https://doi.org/10.1073/pnas.1715305115
World Bank, 2018a. Somalia drought impact and needs assessment: synthesis report (English). World
Bank Group, Washington, D.C.
World Bank, 2018b. Somali Poverty and Vulnerability Assessment. World Bank Group, Washington, D.C.
World Bank, 2017. Somalia Economic Update, July 2017. Mobilizing Domestic Revenue to Rebuild
Somalia. (No. Edition No. 2).
World Bank, 2016a. Somali Poverty Profile 2016. Findings from Wave 1 of the Somali High Frequency
Survey. World Bank.
World Bank, 2016b. Monitoring Global Poverty: A Cover Note to the Report of the Commission on Global
Poverty Chaired by Prof. Sir Anthony B. Atkinson. Washington, DC: The World Bank.
World Bank, 2015. Somalia Economic Update, October 2015. Transition amid Risks with a Special Focus
on Intergovernmental Fiscal Relations. (No. Edition No. 1).
Zezza, A., Carletto, C., Fiedler, J.L., Gennari, P., Jolliffe, D., 2017. Food counts. Measuring food
consumption and expenditures in household consumption and expenditure surveys (HCES).
Introduction to the special issue. Food Policy 72, 1–6.
40
Appendix
Table A.2: Sample overview.
Strata
ID Administrative unit
Population
type
Total
Interviews Total EAs Emerging state
Total
interviews
1 Central Regions IDP 36 3 Central Regions 684
2 Galmudug IDP 0 0 Galmudug 576
3 Jubaland IDP 84 7 Jubaland 1,248
4 Mogadishu IDP 108 9 Banadir 984
5 North East IDP 192 16 North East 840
6 North West IDP 24 2 North West 732
7 South West IDP 24 2 South West 1,296
8 Central Regions nomadic 60
9 Galmudug nomadic 36 Pre‐war region
10 Jubaland nomadic 84 Hiraan 264
12 North East nomadic 96 Middle Shabelle 420
13 North West nomadic 144 Galgaduud 576
13 South West nomadic 84 Gedo 228
25 Hiraan rural 144 12 Lower Juba 996
26 Hiraan urban 48 4 Middle Juba 24
27 Middle Shabelle rural 264 22 Bari 420
28 Middle Shabelle urban 48 4 Mudug 324
29 Galgaduud rural 144 12 Nugaal 96
30 Galgaduud urban 396 33 Awdal 84
31 Lower Juba urban 804 67 Sanaag 108
32 Gedo rural 108 9 Sool 48
33 Gedo urban 48 4 Toghdeer 192
34 Lower Juba rural 108 9 Woqooyi Galbeed 300
35 Middle Juba rural 0 0 Bakool 84
36 Middle Juba urban 0 0 Bay 900
37 Banadir urban 792 66 Lower Shabelle 312
38 Bari rural 48 4 Banadir 984
39 Bari urban 264 22
40 Mudug rural 24 2 Urban / rural / IDP / nomad
41 Mudug urban 96 8 urban 3,936
42 Nugaal rural 12 1 rural 1,356
43 Nugaal urban 36 3 IDP 468
44 Awdal rural 24 2 nomad 504
45 Awdal urban 36 3
46 Sanaag urban+rural 72 6 Oversampled populations
47 Sool urban+rural 24 2 Fisheries 324
48 Toghdeer rural 12 1 Baidoa 540
49 Toghdeer urban 108 9 Kismaayo 612
50 Woqooyi Galbeed rural 36 3 Mogadishu 900
51 Woqooyi Galbeed urban 156 13 Host communities 504
52 Bay urban 540 45
53 Bakool rural 48 4
54 Bakool urban 12 1
55 Bay rural 180 15
56 Lower Shabelle rural 204 17
57 Lower Shabelle urban 48 4
N/A
Host community
sample
urban (IDP
adjacent) 504 42
Total 6,384
41
Figure A.1: Fishery livelihood zones Somalia
Source: FSNAU and FEWSNET.
Table A.3: Source of IDP settlement boundaries
# Pre‐war region IDP name Sources Year
1 Bay Baidoa PESS 2016
2 Hiraan Beletweyne UN Shelter Cluster 2016
3 Nugaal Garowe UN Shelter Cluster 2016
4 Lower Juba Kismayo UN Shelter Cluster 2016
5 Bari Qardho UN Shelter Cluster 2016
6 Hiraan Buloburto UN Shelter Cluster 2015
7 Hiraan Maxaas UN Shelter Cluster 2015
8 Lower Juba Afmadow, Diff and
Dhobley
UN Shelter Cluster 2014
9 Togdheer Burao UN Shelter Cluster 2014
10 Mudug Gaalkacyo North UN Shelter Cluster 2014
11 Mudug Gaalkacyo South PESS 2014
12 Woqooyi Galbeed Hargeisa UN Shelter Cluster 2014
13 Middle Shabelle Jowhar UN Shelter Cluster 2014
14 Lower Juba Kismayo UN Shelter Cluster 2014
15 Gedo Luuq PESS 2014
16 Lower Shabelle Marca PESS 2014
17 Banadir Mogadishu PESS 2014
Replacement of sampling units
Sampling units (EAs, EBs, structures, households) may need to be replaced for a variety of reasons, but
their replacement must follow a predetermined schedule that allows each interviewed household to be
assigned a sampling weight and to preserve the sample’s representativeness.
42
Replacement of enumeration areas (EAs)
An enumeration area (EA) was replaced only in one of the following scenarios:
(i) The EA was insecure for field teams to conduct interviews.
(ii) The EA could not be accessed for logistical reasons.
(iii) The EA did not contain any residential structures.
(iv) All residential structures in the EA were visited unsuccessfully.
Main EAs were replaced from the pool of replacement EAs drawn for the same stratum during sample
selection. All replacement EAs had a replacement rank, thus setting the order of replacement in a
replicable way. Replacement occurred both before fieldwork and during fieldwork. All selected EAs
(including replacements) were manually checked prior to fieldwork to establish whether they were empty
of structures (scenario (iii) above). If an EA was found to be empty, it was replaced with the highest‐ranked
replacement EA within its stratum. If the replacement EA was also empty, the next highest‐ranked
replacement EA was used to replace it, and so on (Table A.3). Prior to fieldwork 3 percent of selected
urban EAs and 53 percent of selected rural EAs were found to be empty and thus replaced. If an EA needed
to be replaced during fieldwork in any of the four scenarios listed above, the same schedule for
replacement applied.
Replacements of enumeration blocks (EBs)
An entire EB was replaced in the following scenarios:
(i) The EB was insecure for field teams to conduct interviews.
(ii) The EB was empty or not comprised of inhabited dwellings (e.g. market).
(iii) All residential structures in the EB were visited unsuccessfully.
If an EB needed to be replaced in any of the three scenarios, the enumerator responsible for the EB
randomly drew a replacement EB from the list of EBs in the current enumeration area using his/her tablet.
Since, in most cases, there were exactly 12 EBs per EA and one interview had to be completed in each EB,
EB replacement thus led to two or more households interviewed in the same EB.
Replacement of Households
Once the enumerator randomly selected a household, he/she made contact, trying to find a
knowledgeable person in the household (an adult of 15 years or older with good knowledge of the
household and its members). Where no knowledgeable person was currently present, enumerators
scheduled follow‐up visits before replacing the household.
Once contact was made during the first visit, it was possible to arrange a meeting at another time of the
day or following day if more convenient for the respondent. However, if no knowledgeable person was at
home and no later appointment was scheduled, the enumerator had to go back to the same household a
second and a third time. At least 5 hours separated these consecutive visits. A household was replaced in
any of the following scenarios:
(i) If the household was deemed unsafe by the enumerator, and this was confirmed by the team
leader.
43
(ii) If someone in the household said that no knowledgeable person was around nor would be in
the next 2 days. In this case, the household was replaced, without a second and third visit.
(iii) The household was found to be empty even after three visits to the household.
(iv) The head of household or a person 15 or above who was sufficiently knowledgeable to
respond the survey was not available after three visits.
(v) The respondent refused to give his/her consent to continue the interview.
(vi) The interview that was conducted with that household is incomplete (the respondent stopped
the interview in the middle or some required fields were not filled in) without the possibility
to return to the household to complete the interview.
Sampling weights
The sampling weight of each household is the inverse of its probability of selection. Its probability of
selection is the combination of selection probabilities at each stage of sample selection, in line with SHFS
wave 2 sampling design discussed in section 2. A household’s probability of selection is the probability of
selection of the primary sampling unit in which it is located, multiplied by the probability of selection of
the secondary sampling unit in which it is located, and so on.
Urban (non‐host communities) and rural households
In urban and rural households, the EA was the primary sampling unit and the enumeration block (EB) was
the secondary sampling unit. Enumerators followed a micro‐listing protocol on the ground, in which they
first listed all the structures in the EB, selected a structure, and then listed all the households in the
selected structure. Thus, the probability of selection for urban and rural households is the following:

,
where
: Probability of selecting the EA, given by

.
: Probability of selecting the enumeration block, given by

.
: Probability of selecting the structure, given by

.
: Probability of selecting the household, given by

.
: Number of EAs selected in strata j.
: Number of households in the sample frame for the original EA i.
: Number of households in the sample frame in strata j.
: Number of blocks selected in EA i.
: Total number of blocks in EA i.
: Number of selected structures in block k.
: Total structures in block k.
44
: Number of households selected in structure m.
: Total number of households in structure m.
Urban and host communities
Since the host community sample was drawn from a subset of urban enumeration areas, urban
households selected in the host communities sample were part of two separate sampling processes. They
thus had two positive probabilities to be selected into the final sample. To reflect this, the probability of
selection for this group is the following:

Tray (1kg) 1
Tumin (125g) 0.125
Table A.6: Summary of cleaning rules for currency.
Currency Condition Correction
Somaliland
shillings thousands
Price>1,000 for food and nonfood item
Price>10,000 for durable goods
Divide by 1,000 because respondents meant units,
not thousands.
50
Somali shillings
thousands
Price>1,000 for food and nonfood items
Price>10,000 for durable goods
Divide by 1,000 because respondents meant units,
not thousands.
US$ Price >1,000 Replace currency to Somali(land) shillings.
Cleaning rules for food consumption data
‐ Rule 1.
o Consumption quantities with missing values for items reported as consumed were
replaced with item‐specific median consumption quantities.
o Missing purchase quantities and missing prices for items consumed were replaced with
item‐specific median purchase quantity and item‐specific median purchase price.
‐ Rule 2. Records where the respondent did not know or refused to respond if the household had
consumed the item, were replaced with the mean value, including non‐consumed records.
‐ Rule 3. Records with the same value for quantity consumed or quantity purchased and price are
assumed to have a data entry error in the price or quantity and are replaced with the item‐specific
medians.
‐ Rule 4. Records that have the same value in quantity consumed and quantity purchased but
different units are assumed to have a wrong unit either for consumption or purchase. For both
quantities, the item‐specific distribution of quantities in kg is calculated to determine the
deviation of the entered figure from the median of the distribution. The unit of the quantity that
is further away from the median is corrected with the unit of the quantity closer to the median.
‐ Rule 5.
o Missing and zero prices are replaced with item‐specific medians.
o Outliers for unit prices were identified and replaced with the item‐specific median. This
includes unit prices in the top 10 percent of the overall cumulative distribution
(considering all items), and unit prices below 0.07 US$.
‐ Rule 6. The consumption value in US$ was truncated to the mean plus 3 times the standard
deviation of the cumulative distribution for each item, if the record exceeded this threshold.
All medians are estimated at the EA level if a minimum of 5 observations are available excluding previously
tagged records. If the minimum number of observations is not met, medians are estimated at the stratalevel
before proceeding to the survey level. In addition, medians greater than 20 kg and smaller than 0.02
kg were not considered for quantities, while medians greater than 20 US$ and smaller than 0.005 US$
were also excluded for unit prices.
Cleaning rules for nonfood consumption data
‐ Rule 1. Zero, missing prices and missing currency for purchased items are replaced with itemspecific
medians.
‐ Rule 2. Records where the respondent did not know or refused to respond if the household had
purchased the item, were replaced with the mean value, including non‐consumed records.
51
‐ Rule 3. Prices that are beyond a specific threshold for each recall period (Table A.7) are replaced
with item‐specific medians.
‐ Rule 4. Prices below the 1 percent and above the 95 percent of the cumulative distribution for
each item are replaced with item‐specific medians.
‐ Rule 5. The purchase value in US$ was truncated to the mean plus 3 times the standard deviation
of the cumulative distribution for each item, if the record exceeded this threshold.
The item‐specific medians were applied at the EA, strata and survey levels as described above.
Table A.7: Threshold for non‐food item expenditure (US$).
Recall period Min Max
1 Week 0.05 30
1 Month 0.20 95
3 Months 0.45 200
1 Year 0.80 1,200
Cleaning rules for durable assets
‐ Rule 1. Vintages with missing values and greater than 10 years are replaced with item‐specific
medians.
‐ Rule 2. Current and purchase prices equal to zero are replaced with item‐specific medians.
‐ Rule 3. Records that have the same figure in current value and purchase price are incorrect. For
both, the item‐vintage‐specific distribution is calculated to determine the deviation of the
entered figure from the median. The one that is further away from that median is corrected
with the item‐year‐specific median value.
‐ Rule 4. Depreciation rates are replaced by the item‐specific medians in the following cases:
o Negative records
o Depreciation rates in the top 10 percent and vintage of one year
o Depreciation rates in the bottom 10 percent and a vintage greater or equal to 3 years.
‐ Rule 5. Records with 100 items or more, and those that reported to own a durable good but did
not report the number were replaced with the item‐specific medians of consumption in US$.
‐ Rule 6. Consumption in the top and bottom 1 percent of the overall distribution were replaced
with item‐specific medians.
‐ Rule 7. Records where the respondent did not know or refused to respond if the household
owned the asset, were replaced with the mean consumption value, including non‐consumed
records.
‐ Rule 8. the consumption value in US$ was truncated to the mean plus 3 times the standard
deviation of the cumulative distribution for each item, if the record exceeded this threshold.
52
All medians are estimated at the EA level if a minimum of 3 observations are available excluding previously
tagged records. If the minimum number of observations is not met, medians are estimated at the stratalevel
before proceeding to the survey level. Table A.8 presents median expenditure and median
depreciation rates for each durable item.
Table A.8: Median consumption and depreciation rate of durable assets.
Item
Median consumption
(current US$/week)
Median
depreciation rate
Air conditioner 0.002 0.264
Bed with mattress 0.092 0.229
Car 0.013 0.159
Cell phone 0.085 0.245
Chair 0.015 0.242
Clock 0.007 0.267
Coffee table (for sitting room) 0.002 0.209
Computer equipment & accessories 0.010 0.182
Cupboard, drawers, bureau 0.019 0.213
Desk 0.001 0.349
Electric stove or hot plate 0.000 0.204
Fan 0.006 0.188
Gas stove 0.005 0.159
Generator 0.017 0.333
Iron 0.007 0.229
Kerosene/paraffin stove 0.000 0.248
Kitchen furniture 0.006 0.296
Lantern (paraffin) 0.000 0.092
Lorry 0.001 0.209
Mattress without bed 0.041 0.267
Mini‐bus 0.002 0.248
Mortar/pestle 0.005 0.244
Motorcycle/scooter 0.004 0.229
Photo camera 0.000 0.005
Radio (‘wireless’) 0.005 0.276
Refrigerator 0.007 0.210
Satellite dish 0.004 0.213
Sewing machine 0.001 0.229
Small solar light 0.002 0.195
Solar panel 0.002 0.188
Stove for charcoal 0.002 0.296
Table 0.014 0.229
Tape or CD/DVD player; HiFi 0.001 0.337
Television 0.056 0.201
Upholstered chair, sofa set 0.021 0.267
VCR 0.000 0.161
Washing machine 0.013 0.210
Table A.9: Overview of spatial variables used in poverty imputation.
Variable Source Description Illustration
53
Distance to bare
areas
WorldPop Global covariate
dataset38. The ESA‐CCI
300m annual global
landcover dataset was
used to produce this layer.
Distance to borders of areas of
which land cover was classified as
bare. The distance is positive
outside of the areas and negative
inside.
Distance to
cultivated areas
WorldPop Global covariate
dataset38. The ESA‐CCI
300m annual global
landcover dataset was
used to produce this layer.
Distance to borders of areas of
which land cover was classified as
cultivated. The distance is positive
outside of the areas and negative
inside.
Temperature WorldClim v2 Average annual temperature.
Original layer from World Clim.
Precipitations WorldClim v2 Average annual precipitations.
Original layer from World Clim.
38 The WorldPop “dist‐to” data sets have been produced by D. Kerr, H. Chamberlain and M. Bondarenko in the framework of
the WorldPop “Global high resolution population denominators”, project funded by the Bill & Melinda Gates Foundation
(OPP1134076).
54
Distance to major
roads
WFP Distance to primary and secondary
roads in km. FM/WP rasterized the
original shapefile to 100m and
computed the distance transform.
Distance to
drought areas
FAO SWALIM Distance in km to borders of areas
labelled as ‘moderate drought’ and
‘severe drought’ (computed by
FM/WP). The distance is positive
outside of the areas and negative
inside.
Distance to medical
sites
UNICEF, FAO SWALIM Distance to medical sites. FM/WP
computed the distance to the
points given in the source dataset.
2005
Distance to schools UNICEF, FAO SWALIM Distance to schools. FM/WP
computed the distance to the
points given in the source dataset.
2004
Distance to water
sources
FAO SWALIM Distance to strategic water points
or sources. FM/WP computed the
distance to the points given in the
source dataset.
2008
55
Distance to
waterways
OSM extract Volunteer‐reported vector data of
waterway locations. FM/WP
rasterized the vectors at 100m and
computed the distance transform.
Conflict density ACLED Reports on violent events (e.g.
battles, riots) from news outlets.
FM/WP computed the spatial
average of the number of fatalities
from January
2014 to May 2018, within a 25 km
radius.
Distance to food
insecure areas
FEWS NET Food security outcomes for
October
2017. FM/WP computed the
distance to borders of areas with
an IPC phase of
3 or more. The distance is positive
outside of the areas and negative
inside.
Distance to urban
areas
UNFPA / PESS
urban EAs
Distance to borders of urban areas.
FM/WP used the UNFPA urban EAs
and filled in gaps within urban
areas. Then we computed the
distance to the urban borders. The
distance is positive outside of the
areas and negative inside.
Distance to unsafe
areas
World Bank Distance to areas labelled as
unsafe by the World Bank. FM/WP
rasterized the shapefile provided
at 100m then computed the
distance to the unsafe areas
border. The distance is positive
outside of the areas and negative
inside.
56
Population density World Bank / Flowminder Population density inferred at
100m as part of the work on:
Defining a new Somali national
sampling frame.
Source: Flowminder / WorldPop.
Table A.10: Summary statistics of collected spatial variables.
Urban
mean
Urban
median
Urban
min
Urban
max
Rural
mean
Rural
median
Rural
min
Rural
max
.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.