Functions for cleaning imported text
The following functions were used to clean text imported from an Excel
Workbooks into Access. In spite of the fact that Excel has a notional limit on
the amount of text a user can put in a cell, when users develop their own database
in Excel there is no limit to the ingenuity to avoid or work round the limits that
the software tries to impose on them. Although these were written specifically
for one application their application in any situation where it is necessary to
clean a text string will be fairly obvious and the changes needed to implement
in Access rather than in Excel are straight forward.
I cannot claim credit for writing these but the person who wrote them
knows who he is and has given me permission to publish them. We worked together
for about 8 months and by the end of that time we were writing code as if it
had been written by the same person. The only way we could tell who had written
what was by virtue of how we had initially split the system in two, but as time
progressed we had each amended each others routines.
The RemoveHardReturns Function removes the hard
carriage returns from a cell string.
Function RemoveHardReturns(strCurrCell as String) As String
On Error GoTo Err_RemoveHardReturns
Dim intLC as
Integer
Dim strCurrChar
as String
For intLC = 1 To Len(strCurrCell)
strCurrCell
= Mid(strCurrCell, intLC,
1)
If strCurrChar
= Chr(10)
or strCurrChr = Chr(13)
Then
strCurrCell = Left(strCurrCell,
intLC -1) & Right(strCurrCell,
Len(strCurrCell) intLC)
End If
Next intLC
RemoveHardReturns
= strCurrCell
Exit_RemoveHardReturns:
Exit Function
Err_RemoveHardReturns:
MsgBox Err.Description
Resume Exit_RemoveHardReturns
End Function
The RemoveDoubleSpaces will find Double Spaces in a string and
remove them.
Function RemoveDoubleSpaces(strCurCell as String) As String
On Error GoTo Err_RemoveDoubleSpaces
Dim intLC as
Integer
Dim strCurrChar
as String
Dim strPrevChar
as String
strCurrChar
=
strPrevChar
=
For intLC = 1 To Len(strCurrCell)
strCurrChar
= Mid(strCurrCell, intLC,
1)
If strCurrChar
= And strPrevChar =
Then
strCurrCell = Left(strCurrCell,
intLC 1) & Right(strCurrCell,
Len(strCurrCell) intLC)
intLC = intLC 1
End If
strPrevChar
= strCurrChar
Next intLC
RemoveDoubleSpaces
= strCurrCell
Exit_RemoveDoubleSpaces:
Exit Function
Err_RemoveDoubleSpaces:
MsgBox Err.Description
Resume Exit_RemoveDoubleSpaces
End Function
The InsertSpaceAfterPeriod Function is designed to make the
text in a cell look like a sentence. This may cause a problem where a user has
inserted text with legal numbering in to refer to clauses of an agreement so
should be treated with due caution.
Function InsertSpaceAfterPeriod(strCurrCell as String) as String
Dim intLC as
Integer
Dim strCurrChar
as String
Dim strPrevChar
as String
strCurrChar
=
strPrevChar
=
For intLC = 1 To Len(strCurrCell)
strCurrChar
= Mid(strCurrCell, intLC,
1)
If strCurrChar
<> And Not IsNumeric(strCurrChar) And strPrevChar = .
Then
strCurrCell = Left(strCurrCell,
intLC 1) & & Right(strCurrCell,
Len(strCurrCell) intLC +
1)
End If
strPrevChar
= strCurrChar
Next intLC
InsertSpaceAfterPeriod
= strCurrCell
Exit_ InsertSpaceAfterPeriod:
Exit Function
Err_ InsertSpaceAfterPeriod:
MsgBox Err.Description
Resume Exit_ MsgBox
Err.Description
End Function
If you then
write a Procedure that loops through a range of cells applying as many of the
above Functions as your application requires you should end up with reasonably
clean text to import into your database of choice. In our case it was Access
and we encountered a number of other issues to do with numbers masquerading as
text and dates doing all sorts of strange impersonations.