Spell-check OleDb Databases

October 14 2009

Introduction

This article describes the implementation of C1DBSpellChecker, a utility that performs spell-checking on OleDb databases including SqlServer and Access.

After reading this article, you will be able to specify a connection string, and then select a table and one or more string fields on the table to spell-check. Typing errors will then be displayed in a grid where they can be corrected. When all corrections have been made, the changes will be saved back to the database.

C1DBSpellChecker

Background

Spell-checking is an important tool used by most people when creating documents of all kinds, from formal reports to simple e-mail messages. Many popular applications provide built-in spell-checking that makes this task easy and almost automatic.

Unfortunately, not all applications have built-in spell-checking. Visual Studio is a good example. Developers use Microsoft Visual Studio to create applications, web pages, and documentation. Unless they have a package such as ComponentOne IntelliSpell, it is likely that spelling mistakes will creep into their work and eventually surface on a website or commercial application.

Even developers that use ComponentOne IntelliSpell often rely on content that is stored in databases and hasn't been properly spell-checked. For example, the popular AdventureWorks database contains typos such as:

alluminum, alumunim (Aluminum)
comparible (Comparable)
securly (Securely)
funtionality (functionality)
manuverability (maneuverability)
responsivness (responsiveness)
you'l find (you'll find)

Spelling errors like this aren't very professional on a catalog or website. Surprisingly, there are few or no tools for spell-checking databases. The C1DBSpellChecker application was designed to fill this need.

Application Structure

The C1DBSpellChecker application performs the following tasks:

Allow the user to select a connection string.
Get the schema for the selected connection (list of tables and fields).
Allow the user to pick a table and one or more string fields.
Load the data and keep only rows that contain spelling errors.
Show the spelling errors on a grid and allow the user to fix them.
Save the changes back to the database.

These tasks are described in the following sections.

Select a connection string

The first step required to spell-check a database is selecting the database to use. This is done by specifying a connection string. We use the ADODB and MSDASC libraries to accomplish this. These libraries are provided by Microsoft and can be freely distributed. They provide a user interface for creating and editing OleDb connection strings.

We decided to use the OleDb data provider because it provides great flexibility, allowing connections to Sql Server, Access, and many others.

The code used to get and edit the connection strings is simple (the version below omits error checking code for clarity; please refer to the source for a more complete version):

// prompt user for a connection string  
string PromptConnectionString(string connString)  
{  
  // create objects we'll need  
  var dlinks = new MSDASC.DataLinksClass();  
  var conn = new ADODB.ConnectionClass();  

  // show connection picker dialog  
  object obj = conn;  
  dlinks.hWnd = (int)Handle;  
  if (dlinks.PromptEdit(ref obj))  
  {  
    connString = conn.ConnectionString;  
  }  

  // done  
  return connString;  
}

Connection strings created by the user are added to a ComboBox and saved as part of the application settings, so they can be reused across sessions. The code that performs this task is listed below:

// form loaded: load recently used connection strings  
protected override void OnLoad(EventArgs e)  
{  
  var mru = Properties.Settings.Default.RecentConnections;  
  if (mru != null)  
  {  
    foreach (string connString in mru)  
    {  
      _cmbConnString.Items.Add(connString);  
    }  
  }  
  base.OnLoad(e);  
}  

// form closing: save recently used connection strings  
protected override void OnFormClosing(FormClosingEventArgs e)  
{  
  var mru = new System.Collections.Specialized.StringCollection();  
  foreach (string item in _cmbConnString.Items)  
  {  
    mru.Add(item);  
  }  
  Properties.Settings.Default.RecentConnections = mru;  
  Properties.Settings.Default.Save();  
  base.OnFormClosing(e);  
}

Because the connection strings are fairly long, the application uses the owner-draw feature of the ComboBox control to trim the connection strings when they are displayed in the drop down. This makes it a lot easier for users to find the connections they are looking for.

The owner-draw code for the ComboBox is as follows:

public Form1())  
{  
  InitializeComponent();  

  // make combo owner-drawn  
  var cmb = _cmbConnString.ComboBox;  
  cmb.DrawMode = DrawMode.OwnerDrawFixed;  
  cmb.DrawItem  = cmb_DrawItem;  
}  

// trim items in combo using ellipsis (they're very long)  
void cmb_DrawItem(object sender, DrawItemEventArgs e)  
{  
  var fmt = new StringFormat();  
  fmt.LineAlignment = StringAlignment.Center;  
  fmt.Trimming = StringTrimming.EllipsisPath;  

  var text = (string)_cmbConnString.Items[e.Index];  
  text = TrimConnectionString(text);  

  var brush = (e.State & DrawItemState.Selected) != 0  
    ? SystemBrushes.HighlightText  
    : SystemBrushes.WindowText;  

  e.DrawBackground();  
  e.Graphics.DrawString(text, _cmbConnString.Font, brush, e.Bounds, fmt);  
  e.DrawFocusRectangle();  
}  

// trim connection string for display  
string[] _keys = new string[] { "Provider", "Initial Catalog", "Data Source" };  
string TrimConnectionString(string text)  
{  
  var sb = new StringBuilder();  
  foreach (var item in text.Split(';'))  
  {  
    foreach (var key in _keys)  
    {  
      if (item.IndexOf(key, StringComparison.InvariantCultureIgnoreCase) > -1)  
      {  
        if (sb.Length > 0)  
        {  
          sb.Append("...");  
        }  
        sb.Append(item.Split('=')[1].Trim());  
      }  
    }  
  }  
  return sb.ToString();  
}

The code works by splitting the connection string into key/value pairs and then keeping only the parts that help identify the connection. Configuration options are removed for clarity. For example, here is a typical OleDb connection string in all its glory:

Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=AdventureWorks;Data Source=spock\sqlexpress

And here is the trimmed version:

SQLOLEDB.1...AdventureWorks...spock\sqlexpress

The trimmed version shows only the information that is relevant for selecting an entry from a long list.

Get the database schema

Once the user picks a connection string, the next step is to use the connection to obtain a database schema. The schema is a list of the tables, fields, and relations in the database. It describes the structure of the database.

We retrieve the database schema using an auxiliary class called OleSDbSchema. This class extends the system DataSet class with a ConnectionString property. Setting this property populates the OleSDbSchema with tables that have the same structure as the tables in the database (but no data).

The code below shows how the OleSDbSchema class obtains the database schema (this version is simplified for clarity; please refer to the source code for a complete version, including code that retrieves constraints, relations, and stored procedures):

// Gets or sets the connection string used to fills the schema.  
public string ConnectionString  
{  
  get { return _connString; }  
  set  
  {  
    if (value != _connString)  
{  
  _connString = value;  
  GetSchema();  
    }  
  }  
}  
void GetSchema()  
{  
  // initialize this DataSet  
  this.Reset();  

  // go get the schema  
  EnforceConstraints = false;  
  using (var conn = new OleDbConnection(connString))  
  {  
    conn.Open();  
GetTables(conn);  
conn.Close();  
  }  
}  
void GetTables(OleDbConnection conn)  
{  
  // add tables  
  var dt = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);  
  foreach (DataRow dr in dt.Rows)  
  {  
    // get type (table/view)  
    var type = (string)dr[TABLE_TYPE];  
if (type != TABLE && type != VIEW && type != LINK)  
    {  
  continue;  
    }  

    // create table  
    var name = (string)dr[TABLE_NAME];  
    var table = new DataTable(name);  
    table.ExtendedProperties[TABLE_TYPE] = type;  

    // save definition in extended properties  
    foreach (DataColumn col in dt.Columns)  
    {  
      table.ExtendedProperties[col.ColumnName] = dr[col];  
    }  

    // get table schema and add to collection  
    try  
    {  
      var select = GetSelectStatement(table);  
      var da = new OleDbDataAdapter(select, conn);  
      da.FillSchema(table, SchemaType.Mapped);  
      Tables.Add(table);  
    }  
    catch { }  
  }  
}

The code shows how tables and columns are created. Notice that the ExtendedProperties property is used to store additional information about each element. This allows us, for example, to distinguish between regular tables, views, and stored procedures since all these elements are represented by DataTable objects within the OleDbSchema.

The OleDbSchema class is described in detail in a separate article. We only use its basic features in this project.

Select table and fields to spell-check

Once the schema has been obtained, it is used to populate a TreeView control. The TreeView has nodes that represent the tables in the database, and each node has child nodes that represent the fields. Only string fields are included since they are the only ones eligible for spell-checking.

This is the code that populates the TreeView control:

// update table tree to reflect new connection string  
void UpdateTableTree()  
{  
  // initialize table tree  
  TreeNodeCollection nodes = _treeTables.Nodes;  
  nodes.Clear();  

  // populate using current schema  
  _treeTables.BeginUpdate();  
  foreach (DataTable dt in _schema.Tables)  
  {  
    if (_schema.GetTableType(dt) == TableType.Table)  
    {  
      // create new node, save table in tag property  
      var node = new TreeNode(dt.TableName);  
      node.Tag = dt;  

      // add string fields to node  
      foreach (DataColumn col in dt.Columns)  
      {  
        if (col.DataType == typeof(string))  
        {  
          var ndCol = node.Nodes.Add(col.ColumnName);  
          ndCol.Tag = col;  
        }  
      }  

      // add new node to the tree  
      if (node.Nodes.Count > 0)  
      {  
        nodes.Add(node);  
      }  
    }  
  }  

  // done  
  _treeTables.Sort();  
  _treeTables.EndUpdate();  
}

The TreeView has check boxes next to each table and field. The check boxes require a fair amount of code to work in an intuitive, automatic manner. The code must ensure that only one table is selected, and is must handle the check boxes next to tables and fields. Specifically, checking a table automatically checks all its fields and un-checks all other tables. Un-checking a table will un-check all its fields. And checking a field automatically checks the parent table and un-checks all other tables.

Here is the code that manages the check boxes:

// handle check boxes  
void \_treeTables\_AfterCheck(object sender, TreeViewEventArgs e)  
{  
  if (_updatingTree)  
  {  
    return;  
  }  

  // start updating...  
  _updatingTree = true;  

  // get node that was clicked  
  var n = e.Node;  

  // clicked on table node  
  if (n.Tag is DataTable)  
  {  
    // apply check state to all child nodes (fields)  
    SetCheck(n, n.Checked);  

    // if this table is checked, uncheck all other tables  
    if (n.Checked)  
    {  
      foreach (TreeNode c in n.TreeView.Nodes)  
      {  
        if (c != n)  
        {  
          SetCheck(c, false);  
        }  
      }  
    }  
  }  

  // clicked on column node  
  if (n.Tag is DataColumn)  
  {  
    // update parent node state  
    bool check = false;  
    foreach (TreeNode c in n.Parent.Nodes)  
    {  
      check |= c.Checked;  
    }  
    n.Parent.Checked = check;  

    // if this node is checked, uncheck all other tables  
    if (n.Checked)  
    {  
      foreach (TreeNode c in n.TreeView.Nodes)  
      {  
        if (c != n.Parent)  
        {  
          SetCheck(c, false);  
        }  
      }  
    }  
  }  

  // done updating...  
  _updatingTree = false;  
}  
void SetCheck(TreeNode n, bool check)  
{  
  n.Checked = check;  
  foreach (TreeNode c in n.Nodes)  
  {  
    c.Checked = check;  
  }  
}

Load and spell-check the selected data

Once the user has selected one or more fields to spell-check, we need to read the data, spell-check each field, and keep only the records that have spelling errors in them. We use an OleDbReader to read the records and a C1SpellChecker to spell-check each one. Records that contain spelling errors are stored in a DataTable to be edited by the user.

Here is the code that reads the data and performs the spell-checking:

// get data table with the rows that contain spelling errors  
DataTable GetSpellingErrors(OleDbDataAdapter da, List columns)  
{  
  // create table and adapter  
  var dt = new DataTable();  

  // get table schema  
  da.FillSchema(dt, SchemaType.Mapped);  

  // make columns not being spell-checked read-only  
  foreach (DataColumn col in dt.Columns)  
  {  
    if (!columns.Contains(col.ColumnName))  
    {  
      col.ReadOnly = true;  
    }  
  }  

  // read rows with DataReader  
  var cmd = da.SelectCommand;  
  cmd.Connection.Open();  
  using (var reader = cmd.ExecuteReader())  
  {  
    while (reader.Read())  
    {  
      // read a row  
      var dr = dt.NewRow();  
      foreach (DataColumn col in dt.Columns)  
      {  
        var index = col.Ordinal;  
        dr[index] = reader.GetValue(index);  
      }  

      // check for errors  
      bool hasErrors = false;  
      foreach (string col in columns)  
      {  
        var text = dr[col] as string;  
        if (!string.IsNullOrEmpty(text))  
        {  
          var errors = _spell.CheckText(text);  
          if (errors.Count > 0)  
          {  
            hasErrors = true;  
          }  
        }  
      }  

      // keep rows that have errors  
      if (hasErrors)  
      {  
        dt.Rows.Add(dr);  
      }  
    }  
  }  

  // table has no changes  
  dt.AcceptChanges();  

  // done  
  return dt;  
}

A simpler but less efficient alternative would be to read all the data into the DataTable first, then spell-check it and remove the rows that don't contain any spelling mistakes. The version listed above is better because rows without errors are discarded immediately and never added to the DataTable.

Show errors and allow editing

Now that we have a DataTable with all the errors, the next step is to show the table to the user so he can review and fix them. We use a C1FlexGrid control to do that. We use the grid's owner-draw feature to mark the typos with the familiar red wavy underlines, and attach a C1SpellChecker to the grid editor so the user gets a nice context menu with spelling suggestions while editing the cells.

Here is the code that binds the grid to the table containing the errors and implements the owner-draw logic:

// bind the grid and enable owner-draw to show the errors  
\_flex.DataSource = \_dtErrors;  
_flex.DrawMode = C1.Win.C1FlexGrid.DrawModeEnum.OwnerDraw;  
\_flex.OwnerDrawCell  = \_flex_OwnerDrawCell;  

// draw wavy read lines under spelling errors  
void \_flex\_OwnerDrawCell(object sender, C1.Win.C1FlexGrid.OwnerDrawCellEventArgs e)  
{  
  // spell-check editable cells (unless we're just measuring)  
  if (!e.Measuring &&   
      _flex.Cols[e.Col].AllowEditing &&   
      e.Row >= _flex.Rows.Fixed)  
  {  
    var text = _flex.GetDataDisplay(e.Row, e.Col);  
    var errors = _spell.CheckText(text);  

    if (errors.Count > 0)  
    {  
      // draw cell as usual  
      e.Style = _errorStyle;  
      e.DrawCell();  

      // build list with error ranges  
      var ranges = new CharacterRange[errors.Count];  
      for (int i = 0; i < errors.Count; i  )  
      {  
        ranges[i] = new CharacterRange(  
          errors[i].Start, errors[i].Length);  
      }  

      // create StringFormat to locate the error ranges  
      var sf = new StringFormat(e.Style.StringFormat);  
      try  
      {  
        sf.SetMeasurableCharacterRanges(ranges);  
      }  
      catch { }  

      // locate the error ranges  
      var rc = e.Style.GetTextRectangle(e.Bounds, null);  
      var rgns = e.Graphics.MeasureCharacterRanges(  
        text, e.Style.Font, rc, sf);  

      // draw wavy red underline for each range  
      foreach (var rgn in rgns)  
      {  
        rc = Rectangle.Truncate(rgn.GetBounds(e.Graphics));  
        for (Point pt = new Point(rc.X, rc.Bottom);   
          pt.X   2 < rc.Right;   
          pt.X  = 4)  
        {  
          e.Graphics.DrawLines(Pens.Red, new Point[]   
          {   
            new Point(pt.X, pt.Y),   
            new Point(pt.X   2, pt.Y - 2),   
            new Point(pt.X   4, pt.Y)   
          });  
        }  
      }  
    }  
  }  
}

And here is the code that connects a C1SpellChecker with the grid editor so the user can see the underlines while editing the cell and get a nice context menu with spelling suggestions and commands:

// connect event handler to customize the grid editor  
\_flex.SetupEditor  = \_flex_SetupEditor;  

// enable spell checker in cell editor  
void \_flex\_SetupEditor(object sender, RowColEventArgs e)  
{  
    var tb = _flex.Editor as TextBox;  
    if (tb != null)  
    {  
        _spell.SetActiveSpellChecking(tb, true);  
    }  
}

The form also contains a button that performs a modal (dialog-based) spell-check over the entire grid. That can be more convenient and comfortable than searching for the errors by inspecting each grid cell visually.

Implementing the modal check option is easy:

// perform modal check on the grid  
private void \_btnSpell\_Click(object sender, EventArgs e)  
{  
  var cols = new List();  
  foreach (Column col in _flex.Cols)  
  {  
    if (col.AllowEditing)  
    {  
      cols.Add(col.Name);  
    }  
  }  
  var speller = new FlexGridSpellChecker(_flex, cols.ToArray());  
  _spell.CheckControl(speller);  
}

The code starts by building a list with the columns that are to be spell-checked. Then is uses the list to build a FlexGridSpellChecker object that implements the spell-checking interface required by the C1SpellChecker control on behalf of the grid. The FlexGridSpellChecker implementation is not especially interesting, so we won't reproduce it here. Please refer to the source code if you are interested in the details.

Save changes back to the database

Once the user has reviewed and corrected the errors, the only task remaining is saving the changes back into the database. To do this, we need an OleDbDataAdapter object with select and update commands. The select command is used to retrieve the data, and the update command is used to write changes back into the database.

Here is the code that creates the OleDbDataAdapter and writes the changes back into the database:

// save changes in the data table  
void SaveChanges()  
{  
  try  
  {  
    // build DataAdapter with select and update commands  
    var sql = _schema.GetSelectStatement(table);  
    var da = new OleDbDataAdapter(sql, ConnectionString);  
    var cmdBuilder = new OleDbCommandBuilder(da);  
    da.UpdateCommand = cmdBuilder.GetUpdateCommand();  

    // save the changes  
    da.Update(_dtErrors);  

    // mark table as clean  
    _dtErrors.AcceptChanges();  

    // disable save changes button until there are more changes  
    _btnSaveChanges.Enabled = false;  
  }  
  catch (Exception x)  
  {  
    // something went wrong? inform the user  
    var dr = MessageBox.Show(  
      this,  
      string.Format(Properties.Resources.FailedToSave, x.Message),  
      Application.ProductName,  
      MessageBoxButtons.OK,  
      MessageBoxIcon.Error);  
  }  
}

This concludes the last part of the application. We walked over the steps required to select the database, spell-check its content, display the errors, allow users to fix them, and save the corrections back to the database. It was a long and fun journey!

Conclusion

We developed the C1DBSpellChecker application with two objectives in mind:

To create a useful tool that fills a real need. Many sites and applications are driven by databases that contain spelling errors, and fixing them without a proper tool is difficult.
To show some useful techniques required to implement real applications in .NET. The techniques covered include database management, owner-draw controls, application settings and resources, and more.

We hope both objectives have been achieved and you find them useful. If you have requests or suggestions for improving this document or the application, please post on our site. Thanks in advance.

Download the full project with source code: C1DBSpellChecker.zip

MESCIUS inc.