How-To? Econometric Analysis Framework for Global Trade Liberalization (2017–2025)
This framework outlines a comprehensive approach to analyzing global trade liberalization from 2017 to the present. It covers the key datasets needed (with sources), suitable econometric models, data preparation steps, and solutions for missing data and comparability issues. The goal is to enable robust analysis of how trade policies (tariff changes, trade agreements, etc.) have affected trade flows and economic outcomes across all countries in recent years.
Key Datasets and Indicators (2017–2025)
Trade Flows (Bilateral & Total): Data on merchandise trade between countries is essential. The IMF’s Direction of Trade Statistics (DOTS) provides bilateral export and import values for virtually all country-pairs at monthly, quarterly, and annual frequencies . DOTS combines reported trade data with mirror (partner-reported) data to achieve near-universal coverage, making it a comprehensive source for global trade flows . (Notably, DOTS incorporates UN Comtrade updates automatically to fill gaps .) For detailed product-level trade, the UN Comtrade database (accessible via the World Bank’s WITS) can be used to construct aggregates. Total national trade (exports, imports) is also available from sources like World Bank World Development Indicators (WDI) and UNCTADstat for cross-checking totals.
Applied Tariffs (Average MFN, Preferential, Effective): Tariff data are critical for measuring liberalization. The WTO’s data portals and World Tariff Profiles reports provide country-level tariff indicators, including Most Favored Nation (MFN) applied rates and preferential rates under trade agreements . These include simple averages and trade-weighted average tariffs (e.g. average MFN applied rate weighted by import volumes) . The World Bank’s WITS (World Integrated Trade Solution) interface gives access to UNCTAD’s TRAINS database, which contains detailed tariff schedules by country and year (including effectively applied rates, taking into account preferences) since the late 1980s . Using WITS, one can obtain average tariffs by year (both MFN and effective rates) for all countries, as well as sectoral tariffs. Together, these sources cover the needed measures of tariff protection (e.g. rises in tariffs during trade wars or cuts due to free trade agreements).
Regional Trade Agreement (RTA) Participation: To account for trade liberalization via free trade agreements, data on RTA membership is required. The WTO maintains an RTA Database documenting all reciprocal trade agreements notified to the WTO, along with their member countries and dates of entry into force . From this, one can derive indicator variables for whether a country pair is in a free trade agreement in a given year. Academic datasets are also available – for example, CEPII’s gravity dataset and the Egger & Larch database – which provide binary indicators (1/0) for the existence of an RTA between each pair of countries by year. All WTO member countries now participate in at least one RTA, so this variable is crucial for capturing preferential liberalization. RTA data allows construction of variables such as “Number of RTAs a country is in” or a dummy for major agreements (e.g. CPTPP, RCEP) during the 2017–2025 period.
GDP and Income (GDP Growth, GDP per Capita): Macroeconomic controls like economic size and growth are obtained from World Bank WDI and IMF World Economic Outlook (WEO) databases. WDI is a comprehensive source with over 1,200 indicators for over 200 economies (with data from 1960 onward) , including GDP (in current and constant terms), GDP growth rates, and GDP per capita (in USD or PPP). The IMF’s WEO database (updated biannually) similarly provides annual GDP growth, output levels, and forecasts for nearly all countries. These datasets ensure we can control for overall economic growth trends when analyzing trade outcomes. For example, GDP (size) is used in gravity models to represent economic mass, and GDP per capita or growth can be used to control for developmental differences in panel regressions.
Employment and Labor Market Indicators: Labor market data helps assess the broader effects of trade on economies. The ILO’s ILOSTAT database provides standardized labor indicators for most countries, and the WDI includes series such as unemployment rate and labor force participation (often based on ILO modeled estimates). For instance, Unemployment, total (% of labor force) is available as an ILO modeled estimate for each country . These indicators (e.g. unemployment rates, employment-to-population ratio) allow us to control for domestic economic conditions and labor market health when studying trade liberalization impacts (since trade shocks can affect employment). We should compile annual unemployment rates, manufacturing employment shares, or similar metrics for 2017–2025 from ILOSTAT/WDI to include in the analysis.
Foreign Direct Investment (FDI) Inflows/Outflows: FDI data captures another dimension of globalization relevant to trade liberalization. Key sources are UNCTAD (which publishes FDI statistics in its World Investment Report and UNCTADstat database) and World Bank WDI. WDI provides FDI net inflows and outflows (in USD and as % of GDP), drawing on IMF Balance of Payments data and other sources . For example, “FDI net inflows (% of GDP)” for each country-year (2017–2025) can be taken from WDI, which sources it from IMF’s International Financial Statistics/BOP data . These data let us control for investment trends or test if trade liberalization coincides with shifts in foreign investment. We might also use bilateral FDI data (from OECD or IMF CDIS) if analyzing specific country pairs, but for a global analysis, aggregate FDI by country-year should suffice.
Inflation and Consumer Prices: Inflation rates (e.g. annual CPI percentage change) are important for deflating nominal values and controlling macro stability. The IMF’s International Financial Statistics (IFS) and WDI provide consumer price index (CPI) data for most countries. In fact, WDI’s inflation series is sourced from IMF’s IFS database . We will gather annual inflation (%) or CPI index for each country from, say, 2017–2025 to use as a control. Stable inflation is often an indicator of macroeconomic health; including it can help isolate the effect of trade policy from domestic demand swings due to price instability. For example, if a country experienced a recession or high inflation that affected trade, controlling for CPI or GDP growth will account for that.
Exchange Rates: Exchange rate fluctuations can influence trade competitiveness, so they are a relevant control variable. We can use official exchange rate data (annual average local currency per USD) from the IMF (IFS) or WDI. The WDI series “Official exchange rate (LCU per US$, period average)” is available for most countries and is typically sourced from IMF data. Additionally, effective exchange rate indices (trade-weighted exchange rates) from the Bank for International Settlements (BIS) or IMF can be included to capture changes in relative currency strength. For example, if analyzing the trade war, we might control for the Chinese yuan and US dollar’s movement. In practice, one could include the percentage change in exchange rate or an index (2010=100, etc.) for each country-year. These data ensure that any trade flow changes due to currency swings are not misattributed to policy changes.
Industrial Production Indices: As a high-frequency real sector indicator, industrial production or manufacturing output indices track economic activity and can serve as controls or outcome variables. Data for industrial production is available from sources like the OECD Main Economic Indicators (for OECD and key partner countries) and IMF IFS (many countries report a monthly or quarterly industrial production index to the IMF). We should compile an annual index or growth rate of industrial production for each major economy if possible. For emerging economies not in OECD, national statistical offices or the IFS may have data. Another source is the CPB World Trade Monitor, which publishes a global industrial production index (and regional breakdowns) – useful for understanding worldwide trends (e.g. the 2020 COVID-19 shock to production). Including an industrial output control helps distinguish trade changes due to demand/supply shocks from those due to trade policy. For example, a drop in exports in 2020 might be explained by a fall in production rather than a tariff change, which an industrial production index would capture.
Policy Uncertainty Measures: Uncertainty in economic policy can affect trade and investment decisions. A notable dataset here is the World Uncertainty Index (WUI) developed by IMF researchers. The WUI is a quarterly index available for 143 countries (those with >2 million population), stretching back decades . It is constructed by text-mining country reports for the frequency of words like “uncertainty”. For our period, we can use the WUI averaged annually as a control for global or country-specific uncertainty (which spiked during events like the 2018 trade tensions). Additionally, the Economic Policy Uncertainty (EPU) index by Baker, Bloom, and Davis is available for a subset of countries (notably the US, China, and other major economies) using news-based measures. Such indices can be included to control for the impact of uncertainty (for instance, the Trade Policy Uncertainty sub-index for the trade war period). Using WUI or EPU will help account for confidence effects – ensuring that our model separates the effect of actual policy changes (tariffs/RTAs) from the general caution businesses might have amid an uncertain environment .
Other Control Variables (Geography, Institutions, Distance, Language): To properly specify models (especially gravity models), we include various time-invariant controls:
Geographic factors: bilateral distance, whether countries share a border, whether they have a common official or spoken language, colonial historical ties, etc. These are standard gravity controls available from the CEPII institute’s datasets. CEPII provides an integrated GeoDist database of such variables for all country pairs . For example, we can import variables like distw (distance between capitals or major cities), contig (border adjacency dummy), comlang_off (common official language dummy), and colony (past colonial relationship dummy) from CEPII’s data. These do not change over time but are crucial for explaining baseline trade costs. Including them helps isolate the impact of policy liberalization from natural advantages or historical links that also facilitate trade.
Institutional quality: variables reflecting governance, ease of doing business, or political stability can be added as country-level controls. Sources include the World Bank’s Worldwide Governance Indicators (e.g. rule of law, regulatory quality, corruption index) and other indices like the Economic Freedom Index or Polity V scores. For a comprehensive panel, the WGI (available annually) can serve to control for institutional differences that might affect trade and investment. For instance, a country with improving governance might see rising trade irrespective of tariffs – controlling for this ensures the trade liberalization effect is isolated.
Geographic and demographic controls: one might also control for whether a country is landlocked, its population (from WDI), or region dummies (Africa, Asia, etc.) to capture regional shocks or geography. Many of these (landlocked status, land area, etc.) are also in the CEPII dataset or easily obtained from WDI.
By assembling this full range of datasets – trade flows, tariffs, RTA participation, macroeconomic indicators, and control variables – we can create a panel dataset of all countries (or country-pairs) from 2017 to 2025. Each variable should be carefully merged (e.g., using country ISO codes) to ensure consistency.
Suggested Econometric Models and Identification Strategies
With the data in hand, several econometric approaches can be used to assess the impact of trade liberalization:
Gravity Model of International Trade: The gravity model is the workhorse for analyzing bilateral trade. It posits that trade volume between two countries is proportional to their economic sizes (GDPs) and inversely related to trade costs (distance, tariffs, etc.) . For our framework, a gravity model would include exporter and importer GDP, distance, common language, etc., and crucially tariffs or RTAs as variables of interest. Gravity equations can be estimated in panel form with country-pair fixed effects and year effects to control for multilateral resistance (Anderson & van Wincoop style). The gravity model has strong empirical validation – trade flows consistently decline with distance and increase with economic mass – and it is widely used to quantify effects of trade agreements . For example, we could estimate:
ln(Trade_{ij,t}) = β_1 ln(GDP_i,t) + β_2 ln(GDP_j,t) + β_3 Tariff_{ij,t} + β_4 RTA_{ij,t} + … + FE + ε_{ij,t}.
This would tell us how a tariff increase or RTA membership affects bilateral trade, controlling for size and other factors. Gravity models can be estimated via OLS in log form or using PPML (Poisson Pseudo-Maximum Likelihood) to handle zeros and heteroscedasticity. In our period (2017–2025), gravity models are ideal for assessing the trade war: e.g., include an indicator for US-China pair in 2018–2019 with higher tariffs to see the deviation from expected trade.
Panel Fixed Effects / Random Effects Models: When analyzing country-level outcomes (rather than bilateral flows), panel data models are useful. A fixed effects model can control for unobserved heterogeneity across countries or pairs. For instance, one might examine the effect of average tariff rates on GDP growth across countries. A fixed-effects specification would include country fixed effects (to absorb time-invariant differences like geography or culture) and year fixed effects (to absorb global shocks each year). This helps isolate the impact of within-country changes in tariffs or trade openness on outcomes. For example, we could model GDP growth or employment as a function of a country’s tariff rate and RTA membership, with country fixed effects. Fixed effects ensure that only the changes in tariffs (liberalization or increases) correlate with changes in the outcome, strengthening causal interpretation. Random effects models are an alternative if the unit-specific effects are uncorrelated with regressors, but in trade policy contexts this is a strong assumption (e.g. protection levels likely correlate with country traits). Thus, fixed effects (or within estimators) are generally preferred for robustness. We can also use pair fixed effects in gravity (each country-pair as its own fixed effect) to control for any bilateral affinity for trade, focusing on policy changes over time. Overall, panel FE models will be a core approach (e.g., differencing out each country’s average trade level to see the effect of policy changes).
Difference-in-Differences (DiD) Models: For discrete policy shocks (like the US-China trade war or Brexit), a DiD strategy can provide a causal estimate of impact. In a DiD, we compare the change in outcomes for an “affected group” (treatment) before vs. after the policy, relative to the change for an unaffected (control) group over the same period. For example, to quantify the trade war effect, one could define the treatment group as country pairs involving the US or China (which experienced tariff hikes in 2018–2019) and the control group as other country pairs with no such tariff changes. By comparing the growth of trade flows in these groups pre- and post-2018, we isolate the tariff impact under the parallel trends assumption. Similarly, one could examine the effect of joining a major trade agreement (e.g. CPTPP in 2018) by comparing a country’s trade or GDP before vs. after joining, against a control group of countries that did not join. The DiD model would include interaction terms like Post_t × Treatment_group to capture the policy’s effect. With panel data from 2017 to 2025, we have multiple periods, so we can implement two-way fixed effects DiD models, controlling for country and year fixed effects. This approach is valuable for evaluating trade liberalization shocks such as the imposition of new tariffs (negative liberalization) or the entry into force of an RTA (positive liberalization) by leveraging cross-country variation in exposure.
Event Study Models: An event study is related to DiD but examines the dynamic impact around the time of a policy change. It would involve plotting or estimating the effects in each period before and after a specific event (e.g., the year a tariff war started or an agreement went into effect). For instance, we can set the event as “Tariff increase in year t0” and then include leads and lags of this event indicator in a regression. This will show if trade flows started decreasing before the tariffs (which might indicate anticipation or other confounders) and how they evolved after (e.g., an immediate drop in year t0 and partial recovery later). Event studies are useful for visualizing the timeline of impact and checking the parallel trend assumption (pre-trend) in DiD. In our timeframe, one could do an event study of the US steel and aluminum tariffs in 2018 or the US-China tariff rounds, or conversely an event study of RCEP’s entry into force in 2022. The model would include year dummies relative to the event year for affected units. This helps capture whether trade liberalization had gradual effects or immediate jumps.
Instrumental Variable (IV) Strategies: Trade policy is not randomly assigned – countries that liberalize might do so in response to other conditions, raising endogeneity concerns. IV methods can address this by finding external sources of variation in trade policy. One classic approach in trade economics is to use exogenous geography-driven instruments: for example, Frankel and Romer (1999) used the portion of trade predicted by distance and geography as an instrument to study trade’s effect on income. In our context of tariffs, possible instruments might include: (a) previous negotiation commitments – e.g., tariff cuts that were agreed in earlier WTO rounds but implemented during 2017–2025, providing predetermined variation; (b) political variables – e.g., changes in government leadership or voting patterns that are unrelated to current economic conditions but lead to tariff changes; or (c) partner shocks – using another country’s tariff changes as an instrument for home tariffs (if there is some contagion or reciprocal arrangement). Another strategy seen in literature is using lagged MFN rates or bound tariff rates as instruments for applied tariffs, on the argument that bound rates (commitments) constrain applied rates and are set by past negotiations, not current economics. If a valid instrument is found (satisfying relevance and exclusion), we can use two-stage least squares (2SLS) to estimate the causal impact of tariffs or trade openness on outcomes (GDP, etc.). For example, to isolate the effect of trade openness on GDP growth, one might instrument a country’s trade/GDP ratio with an index of geographic factors (as in Frankel-Romer) or with global trade shocks weighted by country-specific exposure. IV will strengthen causal claims, albeit finding a good instrument in this setting is challenging. It’s an optional but powerful component of the framework if a convincing instrument can be identified.
Dynamic Panel Models (GMM): Given the relatively short time span (2017–2025 yearly), some outcomes may exhibit persistence (e.g., GDP growth or export levels). Dynamic panel estimators like the Arellano-Bond Generalized Method of Moments (GMM) can be applied if we include lagged dependent variables. For instance, if we model export growth as a function of past export growth and current tariff levels, the lagged export term is endogenous. The difference GMM or system GMM approach uses internal instruments (lagged values of the variables) to address this. This is useful for capturing inertia in trade or GDP while still consistently estimating the impact of policy changes. Additionally, GMM can help tackle any remaining endogeneity of policy variables by using their lagged values (or differences) as instruments. In practice, one could set up a dynamic panel where, say, Trade_{ij,t} depends on Trade_{ij,t-1}, tariffs, RTAs, and other controls, and use GMM to estimate it – ensuring, for example, that the shock of a tariff hike can be distinguished from the trend. The GMM approach is recommended if we suspect feedback effects (e.g., past trade volumes influence current policy) or want to control for habit persistence in the data. Care must be taken to have enough time periods (the period 2017–2025 is not very long, but using quarterly data for some variables could increase observations) and to test for instrument validity (Hansen test, etc.).
In summary, the analysis might start with a baseline gravity model (perhaps estimated with PPML for robustness), then move to panel FE models for aggregate impacts, use DiD for specific policy events, and consider IV/GMM for strengthening causality. Each method has pros and cons, and using multiple approaches can provide a fuller picture.
Data Preparation and Variable Construction
A careful data preparation process is crucial to ensure the analysis is accurate and credible. Below are key steps and recommendations:
Merging and Consistency: All the datasets (from WTO, IMF, World Bank, etc.) should be merged using consistent country identifiers. Preferably use ISO 3-digit country codes or country names that are standardized. Many sources provide country codes (e.g., “USA” for United States, “CHN” for China). Mapping tables may be needed for entities not in ISO (e.g., Hong Kong in some datasets). Ensure that years are aligned and, if using quarterly data (for WUI or IFS monthly trade) alongside annual data, decide whether to aggregate to annual or keep higher frequency for certain analyses.
Constructing Trade Variables: From bilateral trade data, we can construct total trade per country and trade ratios. For example, compute total exports and imports for each country-year by summing bilateral flows. Also consider the trade openness metric (trade as % of GDP) as an outcome or control. If using bilateral data in gravity, create a dummy for whether two countries trade (some pairs may have zero trade, which could be included in PPML estimation). For handling zeros in log-linear models, either add a small constant or use PPML which can handle zero trade values naturally.
Tariff Measure Construction: We often need a single summary tariff variable per country-year or country-pair-year. One recommendation is to use trade-weighted average tariffs. This means weighting each tariff by the import value of that product or partner, to reflect the actual tariff burden on trade . For instance, if analyzing country-level protection, instead of a simple average of all tariff lines, calculate the import-weighted average tariff (effective tariff rate) so that tariffs on heavily traded goods count more . This can be done using detailed data: multiply each product’s tariff rate by its share in total imports, then sum up. Similarly, for bilateral tariffs (in an RTA context), one could compute the average tariff that country A imposes on imports from country B. If detailed data is unavailable, an alternative is to use the effectively applied tariff rate from WDI (which is already often trade-weighted). Additionally, if analyzing the impact of tariffs on exports, consider the importing country’s tariff as the relevant variable for the exporting country’s trade flow. In gravity, one might include tariffs of both exporter and importer (or tariff preferences within RTAs). Construction of an RTA dummy for each country-pair (1 if an FTA in force, 0 otherwise) will be needed – this can be done from the WTO RTA list by year.
Effective Rate of Protection: If the analysis calls for it, the effective rate of protection (ERP) can be computed. ERP accounts for tariffs on inputs as well as outputs. It requires input-output data by sector. While detailed ERP may be beyond scope, note that if needed, one could use input-output tables to adjust tariffs (this is more specialized: for example, if a country cuts tariffs on final goods but not on inputs, the ERP for domestic producers might decrease). This is likely not required for a broad analysis, but it’s a consideration if focusing on industry-level impacts.
Deflating and Indexing: Ensure that all monetary values are in real terms to account for inflation. Trade values (often in current USD) can be deflated by a price index – either use each country’s GDP deflator or CPI to deflate its trade, or use world export price indices. Alternatively, use growth rates or ratios (trade/GDP) which inherently adjust for size changes. If we use GDP in constant USD from WDI, that is already real GDP (adjusted for inflation) and is comparable over time; similarly, use constant price exports/imports if available. When combining data from multiple sources, check units (e.g., some FDI data might be in millions of USD, others in billions) and convert accordingly.
Handling Frequency Differences: Some data are annual (GDP, tariffs typically), while others might be quarterly or monthly (trade flows, industrial production, WUI). Decide on using annual averages or end-of-year values for higher-frequency data to match the yearly panel. For example, average the WUI index quarterly values to get an annual uncertainty index per country. For industrial production, one could use the year-over-year % change in industrial output (which is annualized) to have an annual growth rate. Consistently use the same period coverage for all variables (2017–2025 in this case) and consider aligning to calendar year.
Quality Control: Perform checks such as: do total exports equal total imports globally (they should, up to reporting discrepancies)? Are there obvious outliers or data entry errors (e.g., a tariff rate of 500% which might be an error or a specific case like tobacco – consider winsorizing or treating outliers carefully)? Also ensure RTA indicators line up with dates (e.g., if a country joins an RTA mid-2019, perhaps mark it as 2019=1, or use 2020 as full-year effect). Document any adjustments made (for instance, if you fill a missing GDP for Syria 2020 by extrapolation due to civil war data gaps, note that).
Creating Derived Variables: Depending on model needs, create interaction terms or dummies: for DiD, create a “post-treatment” dummy (e.g., post-2018 for trade war) and a “treated” group dummy (e.g., 1 for US-China trade flows) then an interaction of the two which is the DiD term. For event studies, create dummies for each lead/lag year relative to event. If doing gravity, compute logarithms of continuous variables like GDP, distance, etc., since the model is multiplicative in theory. Also consider taking logs of trade flows (with +1 for zeros or use PPML approach). For panel regression on country outcomes, consider lagging some independent variables if appropriate (to avoid simultaneity, e.g., use lagged tariffs to explain current GDP growth, assuming policy changes take time to effect).
By following these steps, we ensure that our dataset is analysis-ready: all variables are in the correct format, measured on consistent scales, and representative of the concepts we intend (e.g., “tariff liberalization” is captured by a meaningful decline in a tariff index). A well-constructed dataset reduces the risk of biased or spurious results.
Handling Missing Data and Cross-Country Comparability
Global data invariably have gaps and differences in definitions. Here’s how to address them:
Dealing with Missing Data: Identify where data is missing for certain countries or years. For critical series like GDP or trade, missing data for a few small countries can often be dropped (since the analysis is “global”, losing a very small economy might not bias results much). However, for completeness, consider filling gaps using alternative sources: e.g., if WDI lacks 2021 GDP for a country, check IMF WEO or that country’s statistical agency. In some cases, interpolation can be used for a short gap (like if you have 2019 and 2021, you might approximate 2020). Flag any interpolated or estimated values. For variables like tariffs and RTAs, if data isn’t reported in a year, it might be safe to assume the last known tariff or agreement status continued (since tariff schedules and agreements don’t usually change every year). The IMF DOTS data we use for trade already estimates missing trade flows using partner data , which greatly reduces missingness in trade – a big advantage of DOTS. For other series, the unbalanced panel nature is acceptable: econometric methods (fixed effects, etc.) can handle some NAs as long as they’re not systemic. Just be mindful if entire regions or certain types of countries have missing data, as that could bias the sample.
Comparability Across Countries: Because data come from different national sources originally, definitions can vary. We rely on international databases (WDI, ILOSTAT, etc.) because they already attempt to harmonize definitions (e.g., the unemployment rate is an ILO modeled estimate for consistency ). Still, check meta-data: for example, some countries report GDP on fiscal year vs calendar year – WDI usually adjusts that, but one should be aware. Also, price indices base years might differ; if we use inflation rates (percent change) that’s fine, but if we use index levels, we might want to rebalance them to a common base year. Exchange rate regimes differ (fixed vs floating) – when including exchange rate, consider using % change or deviation from trend, rather than absolute level, to make it comparable (since a fixed peg will show no movement but that has an economic interpretation). For industrial production, countries might include different sectors; using growth rates normalizes a lot of that. In summary, stick to relative measures (growth rates, percentages) or indices that are comparable, and use dummies or fixed effects to absorb country-specific measurement differences.
Country Groupings and Weights: In global analysis, one might want to ensure results aren’t driven solely by large economies. If appropriate, consider weighting regressions (e.g., weight by country GDP or population for country-level regressions) to give each observation proportional influence. Alternatively, run analyses by subgroup (developed vs developing) to check robustness. The dataset includes all countries, but interpretation might differ: an RTA effect might be stronger among developing countries than developed, for instance – comparability checks could involve interacting policy effects with region or income-group dummies.
Temporal Consistency: The period 2017–2025 saw unusual events (trade wars, a pandemic). Check that those events are accounted for (e.g., the year dummies or common trends will capture the 2020 pandemic shock which affected all countries’ trade). If analyzing something like the effect of tariffs, be careful that 2020 isn’t falsely interpreted – include a COVID-19 dummy or allow for differential effects in 2020 to avoid attributing pandemic-related trade collapse to tariff policy. This is part of comparability in the time dimension – ensuring the model recognizes global shocks versus policy changes.
Validation with External Benchmarks: For key variables like tariffs and trade, compare the levels or changes with known facts to ensure the data assembly is correct. For example, we know from PIIE reports that US average tariff on Chinese goods rose from ~3% in 2017 to ~19% in 2019 . Does our constructed tariff variable for US-China reflect a similar jump? Validating such specifics gives confidence that the data reflects real-world policy changes. If we notice discrepancies, it may indicate missing preferential tariffs or mis-coding of an RTA (like missing that the USMCA replaced NAFTA in 2020, though tariffs remained zero so it’s subtle in data).
Missing Variables for Some Countries: Some smaller countries might not have certain indicators (e.g., no policy uncertainty index if they’re not in WUI due to population < 2 million, or no industrial production index if not reported). In such cases, we have a few options: (1) substitute a similar measure (maybe use regional uncertainty average for a missing country, or omit that variable for those countries), (2) drop those countries from analyses needing that variable, or (3) use multiple imputation techniques to predict the missing values. Given our focus is global, excluding a few microstates or sparsely populated nations (which often are the ones missing data) is usually acceptable. The trade-off is between completeness and introducing noise via imputation.
Ensure Stationarity/Trends: When comparing across countries, especially in panel regressions, detrending might be necessary if variables have strong trends. For example, many developing countries were lowering tariffs gradually in the 2010s; those also had growth improvements – to avoid spurious correlation, including year fixed effects or analyzing deviations from global mean can help. Similarly, if trade/GDP rises everywhere due to recovery from 2016 slowdown through 2017, we wouldn’t want to attribute that to policy – year effects handle common trends.
Finally, document any data limitations: e.g., “We lack data on services trade liberalization, so our analysis is confined to goods trade.” Acknowledge if certain measures (like NTMs – non-tariff measures) are not included due to data unavailability, which could be an area for further research. By handling missing data carefully and accounting for differences across countries, the resulting analysis will be robust and credible, truly reflecting the impact of trade liberalization in the 2017–2025 period rather than artifacts of the data.
-
IMF Direction of Trade Statistics (for bilateral trade flows)
UN Comtrade via World Bank WITS (detailed trade data)
WTO Tariff Profiles & Data Portal (tariff rates for 170+ countries)
World Bank WITS/UNCTAD TRAINS (historical tariff database since 1989)
WTO RTA Database (comprehensive list of trade agreements)
World Bank World Development Indicators (GDP, macro indicators – 1200+ series)
ILOSTAT and WDI (unemployment and labor metrics, ILO modeled estimates)
World Bank WDI (FDI inflows/outflows, sourced from IMF and others)
IMF International Financial Statistics via WDI (inflation/CPI and exchange rates)
IMF/World Uncertainty Index (global uncertainty measure, 143 countries)
CEPII Gravity Dataset (distance, common language, colonial ties, etc.)