Talk to Sales

Benchmarks

View scores and output across OCR models spanning many document categories.

Want to run these evals on your own documents?

Talk to Sales
Page 1

have more extraneous words. Therefore, we assume that

\mathrm{Pr}(\phi_0|\phi_1^l, \mathbf{e}) = \binom{\phi_1 + \cdots + \phi_l}{\phi_0} p_0^{\phi_1 + \cdots + \phi_l - \phi_0} p_1^{\phi_0} \quad (1.30)

for some pair of auxiliary parameters p_0 and p_1 . The expression on the left-hand side of this equation depends on \phi_1^l only through the sum \phi_1 + \cdots + \phi_l and defines a probability distribution over \phi_0 whenever p_0 and p_1 are nonnegative and sum to 1. We can interpret \mathrm{Pr}(\phi_0|\phi_1^l, \mathbf{e}) as follows. We imagine that each of the words from \tau_1^l requires an extraneous word with probability p_1 and that this extraneous word must be connected to the empty cept. The probability that exactly \phi_0 of the words from \tau_1^l will require an extraneous word is just the expression given in Equation (1.30).

As with Models 1 and 2, an alignment of (\mathbf{f}|\mathbf{e}) is determined by specifying a_j for each position in the French string. The fertilities, \phi_0 through \phi_l , are functions of the a_j 's: \phi_i is equal to the number of j 's for which a_j equals i . Therefore,

\begin{aligned} \mathrm{Pr}(\mathbf{f}|\mathbf{e}) &= \sum_{a_1=0}^l \cdots \sum_{a_m=0}^l \mathrm{Pr}(\mathbf{f}, \mathbf{a}|\mathbf{e}) \\ &= \sum_{a_1=0}^l \cdots \sum_{a_m=0}^l \binom{m - \phi_0}{\phi_0} p_0^{m - 2\phi_0} p_1^{\phi_0} \prod_{i=1}^l \phi_i! n(\phi_i | e_i) \times \\ & \quad \prod_{j=1}^m t(f_j | e_{a_j}) d(j | a_j, m, l) \end{aligned} \quad (1.31)

with \sum_f t(f|e) = 1 , \sum_j d(j|i, m, l) = 1 , \sum_\phi n(\phi|e) = 1 , and p_0 + p_1 = 1 . The assumptions that we make for Model 3 are such that each of the pairs (\tau, \pi) in \langle \mathbf{f}, \mathbf{a} \rangle makes an identical contribution to the sum in Equation (1.29). The factorials in Equation (1.31) come from carrying out this sum explicitly. There is no factorial for the empty cept because it is exactly cancelled by the contribution from the distortion probabilities.

By now, the reader will be able to provide his own auxiliary function for seeking a constrained minimum of the likelihood of a translation with Model 3, but for completeness and to establish notation, we write

\begin{aligned} h(t, d, n, p, \lambda, \mu, \nu, \xi) &= \mathrm{Pr}(\mathbf{f}|\mathbf{e}) - \sum_e \lambda_e \left(\sum_f t(f|e) - 1\right) - \sum_i \mu_{iml} \left(\sum_j d(j|i, m, l) - 1\right) \\ & \quad - \sum_e \nu_e \left(\sum_\phi n(\phi|e) - 1\right) - \xi (p_0 + p_1 - 1). \end{aligned} \quad (1.32)

Following the trail blazed with Models 1 and 2, we define the counts

c(f|e; \mathbf{f}, \mathbf{e}) = \sum_{\mathbf{a}} \mathrm{Pr}(\mathbf{a}|\mathbf{e}, \mathbf{f}) \sum_{j=1}^m \delta(f, f_j) \delta(e, e_{a_j}), \quad (1.33)

c(j|i, m, l; \mathbf{f}, \mathbf{e}) = \sum_{\mathbf{a}} \mathrm{Pr}(\mathbf{a}|\mathbf{e}, \mathbf{f}) \delta(i, a_j), \quad (1.34)

c(\phi|e; \mathbf{f}, \mathbf{e}) = \sum_{\mathbf{a}} \mathrm{Pr}(\mathbf{a}|\mathbf{e}, \mathbf{f}) \sum_{i=1}^l \delta(\phi, \phi_i) \delta(e, e_i), \quad (1.35)

c(0; \mathbf{f}, \mathbf{e}) = \sum_{\mathbf{a}} \mathrm{Pr}(\mathbf{a}|\mathbf{e}, \mathbf{f}) (m - 2\phi_0)