Administration
  Home arrow Administration arrow Page 5 - Dealing with Files and Filesystems
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ADMINISTRATION

Dealing with Files and Filesystems
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 5
    2006-12-28

    Table of Contents:
  • Dealing with Files and Filesystems
  • HACK#14: Get the Most Out of grep
  • HACK#15: Manipulate Files with sed
  • HACK#16: Format Text at the Command Line
  • HACK#17: Delimiter Dilemma

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    TestComplete™ automates software testing for a fraction of what the big guys charge. Easy functional and load testing for all Windows, .NET, Java and Web apps. Download a free trial now.

    Dealing with Files and Filesystems - HACK#17: Delimiter Dilemma
    (Page 5 of 5 )

    Deal with double quotation marks in delimited files.

    Importing data from a delimited text file into an application is usually painless. Even if you need to change the delimiter from one character to another (from a comma to a colon, for example), you can choose from many tools that perform simple character substitution with great ease.

    However, one common situation is not solved as easily: many business applications export data into a space- or comma-delimited file, enclosing individual fields in double quotation marks. These fields often contain the delimiter character. Importing such a file into an application that processes only one delimiter (PostgreSQL for example) may result in an incorrect interpretation of the data. This is one of those situations where the user should feel lucky if the process fails.

    One solution is to write a script that tracks the use of double quotes to determine whether it is working within a text field. This is doable by creating a variable that acts as a text/nontext switch for the character substitution process. The script should change the delimiter to a more appropriate character, leave the delimiters that were enclosed in double quotes unchanged, and remove the double quotes. Rather than make the changes to the original datafile, it's safer to write the edited data to a new file.

    Attacking the Problem

    The following algorithm meets our needs:

    1. Create the switch variable and assign it the value of1, meaning "nontext". Well declare the variabletswitchand define it astswitch = 1.
    2. Create a variable for the delimiter and define it. We'll use the variabledelimwith a space as the delimiter, sodelim = ' '.
    3. Decide on a better delimiter. We'll use the tab character, sonew_delim = '\t'.
    4. Open the datafile for reading. 
    5. Open a new file for writing.


    Now, for every character in the datafile:

    1.  Read a character from the datafile.
    2. If the character is a double quotation mark,tswitch = tswitch * -1.
    3. If the character equals the character indelimandtswitchequals 1, writenew_delimto the new file.
    4. If the character equals that indelim andtswitchequals -1, write the value ofdelimto the new file.
    5. If the character is anything else, write the character to the new file.

    The Code

    The Python script redelim.py implements the preceding algorithm. It prompts the user for the original datafile and a name for the new datafile. Thedelimandnew_delimvariables are hardcoded, but those are easily changed within the script.

    This script copies a space-delimited text file with text values in double quotes to a new, tab-delimited file without the double quotes. The advantage of using this script is that it leaves spaces that were within double quotes unchanged.

    There are no command-line arguments for this script. The script will prompt the user for source and destination file information.

    You can redefine the variables for the original and new delimiters,delim andnew_delim, in the script as needed.

      #!/usr/local/bin/python
     
    import os
     
    print """ Change text file delimiters.
     
    # Ask user for source and target files. 
      sourcefile = raw_input('Please enter the path and name of the source file:') 
      targetfile = raw_input('Please enter the path and name of the target file:')
      # Open files for reading and writing. 
      source = open(sourcefile,'r')
      dest   = open(targetfile,'w')
     
    # The variable 'm' acts as a text/non-text switch that reminds python
      # whether it is working within a text or non-text data field.
      tswitch = 1
     
    # If the source delimiter that you want to change is not a space,
      # redefine the variable delim in the next line.
      delim = ' '
     
    # If the new delimiter that you want to change is not a tab,
      # redefine the variable new_delim in the next line.
      new_delim = '\t'
      for charn in source.read():
              if tswitch == 1:
                   if charn == delim:
                            dest.write(new_delim)
                   elif charn == '"':
                            tswitch = tswitch * -1
                   else:
                            dest.write(charn) 
          elif tswitch == -1:
                   if charn == '"':
                          tswitch = tswitch *
    -1
                   else:
                          dest.write(charn)
      source.close()
      dest.close()

    Use of redelim.py assumes that you have installed Python, which is available through the ports collection or as a binary package. The Python module used in this code is installed by default.

    Hacking the Hack

    If you prefer working with Perl, DBD::AnyData is another good solution to this problem.

    See Also

    • The Python home page (http://www.python.org/)

    Please check back next week for the conclusion of this article.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · This article is an excerpt from the book "BSD Hacks," published by O'Reilly. We hope...
       · I'm wondering who has written this book since the methods they use are dumb and...
     

    Buy this book now. This article is excerpted from chapter two of the book BSD Hacks, written by Dru Lavigne (O'Reilly, 2005; ISBN: 0596006799). Check it out today at your favorite bookstore. Buy this book now.

       

    ADMINISTRATION ARTICLES

    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization
    - Advanced Concepts on Dealing with Files and ...
    - Dealing with Files and Filesystems
    - More Hacks for the User Environment in BSD
    - Personalizing the User Environment in BSD
    - Customizing the User Environment in BSD




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway