Skip to main content

Size Distribution of Firms

Size Distribution of Firms

The size distribution of firms in the French traders dataset is illustrated in the figure below, with exporters on the left and importers on the right. The distributions for multiple years are superimposed, with blue and yellow dots representing the distribution of the number of agents by size bin and traded value by size bin, respectively. The plots use a log-log scale, where log-normal size distributions appear as quadratic parabolas.

Distributions of agents sizes (blue) and value (yellow). Exports (left) and imports (right).

It is evident that the size distribution of French traders is compatible with a log-normal distribution (truncated at the minimum value x=3x = 3). This aligns with previous evidence regarding populations of economic agents. However, there is some drift and widening of the fitted log-normal parameters over the years, suggesting that the size distribution may not be stationary. This widening would be more pronounced if firms experienced random log jumps akin to Brownian motion, indicating that negative autocorrelation of shocks may stabilize agents' time series.

A log-normal size distribution implies a concentration of significant value among the largest agents. For French traders, 90% of exports are concentrated among firms exporting more than log(s)=xlog(s) = x, and 90% of imports are concentrated among firms importing more than log(s)=xlog(s) = x.

If we focus on these largest agents, a Pareto (power law) rule is also partly compatible with the observed size distribution. Thus, a log-normal and a Pareto rule are not contradictory descriptions of the size distribution of firms. The Pareto model provides a linear approximation to the upper tail of the distribution, making it a useful model for studying the consequences of size distributions, even if not completely accurate empirically.

Both log-normal and Pareto size distributions imply concentration, leading to the familiar 80%-20% concentration rule. This feature highlights the high concentration of value among a small portion of agents, given the empirically observed distribution parameters.

Formalizing Size Distributions

Consider a histogram that shows the number of agents with total sales in each interval [sˉb,sˉb+ds)[\bar s_b, \bar s_b + ds) of a partition of the real numbers. The total number of firms is the sum of the population of all bins N=binsnjN = \sum_{bins} n_j. If dsds is small enough, we can approximate firm sizes by sˉb\bar s_b. The total value associated with firms in the bin is Sb=nbsˉbS_b = n_b \bar s_b. The total value disaggregated by firm size is:

X=j=1Nsj=binsSbbinsnbsˉbX = \sum\limits_{j = 1}^{N} s_j = \sum_{bins} S_b \approx \sum_{bins} n_b \bar s_b

Let us denote firm sizes in linear scale as ss and the normalized probability density function (PDF) of firm sizes as pcnt(s)p_{cnt}(s). Similarly, there is a PDF of value: pval(s)p_{val}(s). In the continuous limit, equation becomes:

X=X pval(s) ds=N pcnt(s) s dsX = \int X\ p_{val}(s)\ ds = \int N\ p_{cnt}(s)\ s\ ds

Here, value XX and population NN are linked by the fact that the first moment of the value of the population distribution is the zeroth moment of the value distribution. The average firm sales sˉE[s]=spcnt(s)ds\bar s \equiv \operatorname{E}[s] = \int s\,p_{cnt}(s)\,ds allows us to express X=NsˉX = N \bar s. For each size bin: SbnbsˉbS_b \approx n_b \bar s_b.

The integrands in the equation tell us:

X p_{val}(s) = N p_{cnt}(s)\ s\

For a given value ss, the amount vx=X . pval(s)=N . pcnt(s) . s v_x= X\ .\ p_{val}(s) = N\ .\ p_{cnt}(s)\ .\ s\ is how much the holders of such sum hold altogether. For example, if n2=103n_2 = 10^3 people hold 100 dollars (10210^2) each, this group has v2=103+2=105v_2 = 10^{3+2}= 10^5 dollars.

From the equation, we see that the distribution of firm sizes pcnt(s)p_{cnt}(s) determines the distribution of value pval(s)p_{val}(s). Using the log variable tt in place of ss, i.e., s=10t=eln(10)ts = 10^t = e^{\ln(10) t}, the size distribution becomes: pcnt(t)=N(μ,σ)ea(tμ)2p_{cnt}(t) = \mathcal{N}(\mu, \sigma)\sim e^{-a(t - \mu)^2}. The product of pcnt(t)p_{cnt}(t) and 10t10^t results in a shifted parabola N(μ+ln(10)σ2,σ)\mathcal{N}(\mu + \ln(10)\sigma^2, \sigma), making the distribution of value log-normal if the distribution of agent sizes is log-normal. This shift ln(10)σ2ln(10)\sigma^2 causes the concentration of value among large agents.

The most crucial characteristic of empirical distributions of firm-level sales is their definition on a log scale. They can be expressed as 10C()10^{C(\cdot)} with C(.)C(.) a normal distribution or an exponentially decreasing distribution (Pareto rule). This has been consistently observed in various populations of economic agents and is key to our study. World trade, split by different categories, also shows log-normal distributions and fluctuations, suggesting broader applicability of these tools.