[1] E. Myers, ``An O(ND) Difference Algorithm and Its Variations,'' Algorithmica 1, 2 (1986), 251-266. http://www.cs.arizona.edu/people/gene/PAPERS/diff.ps
Example 2. Printing the Edit Script
The below code computes and prints the edit script of two 8 bit encoded character strings a and b. Note that off and len for DIFF_INSERT operations reference sequence b whereas matches and deletes reference sequence a.
int d, sn, i; struct varray *ses = varray_new(sizeof(struct diff_edit), NULL); d = diff(a, 0, n, b, 0, m, NULL, NULL, NULL, 0, ses, &sn, NULL); for (i = 0; i < sn; i++) { struct diff_edit *e = varray_get(ses, i); switch (e->op) { case DIFF_MATCH: printf("MAT: "); fwrite(a + e->off, 1, e->len, stdout); break; case DIFF_DELETE: printf("DEL: "); fwrite(a + e->off, 1, e->len, stdout); break; case DIFF_INSERT: printf("INS: "); fwrite(b + e->off, 1, e->len, stdout); break; } printf("\n"); } varray_del(ses);
Diff definitions
Synopsis
Description
#include <mba/diff.h> typedef const void *(*idx_fn)(const void *s, int idx, void *context); typedef int (*cmp_fn)(const void *e1, const void *e2, void *context); typedef enum { DIFF_MATCH = 1, DIFF_DELETE, DIFF_INSERT } diff_op; struct diff_edit { short op; /* DIFF_MATCH, DIFF_DELETE or DIFF_INSERT */ int off; /* off into a if MATCH or DELETE, b if INSERT */ int len; };
Each element in the ses varray is a struct diff_edit structure and represents an individual match, delete, or insert operation in the edit script. The op member is DIFF_MATCH, DIFF_DELETE or DIFF_INSERT. The off and len members indicate the offset and length of the subsequence that matches or should be deleted from sequence a or inserted from sequence b.
The diff function
Description
#include <mba/diff.h> int diff(const void *a, int aoff, int n, const void *b, int boff, int m, idx_fn idx, cmp_fn cmp, void *context, int dmax, struct varray *ses, int *sn, struct varray *buf);
If the ses parameter is not NULL it must be a varray(3m) with a membsize of sizeof(struct diff_edit). Each struct diff_edit element in the varray(3m) starting from 0 will be populated with the op, off, and len that together constitute the edit script. The number of struct diff_edit elements in the edit script is written to the integer pointed to by the sn parameter. If the ses or sn parameter is NULL, the edit script will not be collected.
If the dmax parameter is not 0, the calculation will stop as soon as it is determined that the edit distance of the two sequences equals or exceeds the specified value. A value of 0 indicates that there is no limit.
If the buf parameter is not NULL it must be a varray(3m) with membsize of sizeof(int) and will be used as temporary storage for the dynamic programming tables. If buf is NULL storage will be temporarily allocated and freed with malloc(3) and free(3).