XML Data Repository
Repository HomeProtein Sequence Database
|
|||||||
Integrated collection of functionally annotated protein sequences. | |||||||
from Georgetown Protein Information Resource | Nov 9 2001 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
psd7003.xml | dtd | Protein Sequence Database | .xml (683 MB) .gz (103 MB) .xmi (70 MB) | 21305818 | 1290647 | 7 | 5.15147 |
SwissProt
|
|||||||
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. | |||||||
from ExPASy - SWISS-PROT and TrEMBL | 1998 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
SwissProt.xml | NA | SwissProt database | .xml (109 MB) .gz (13 MB) .xmi (7 MB) | 2977031 | 2189859 | 5 | 3.55671 |
Auction Data
|
|||||||
Auction data converted to XML from web sources. | |||||||
from Anhai Doan | 2001 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
321gone.xml | dtd | .xml (23 KB) .gz (6 KB) .xmi (6 KB) | 311 | 0 | 5 | 3.76527 | |
ebay.xml | dtd | EBay auction data | .xml (34 KB) .gz (10 KB) .xmi (10 KB) | 156 | 0 | 5 | 3.75641 |
ubid.xml | dtd | UBid auction data | .xml (19 KB) .gz (3 KB) .xmi (3 KB) | 342 | 0 | 5 | 3.76608 |
yahoo.xml | dtd | Yahoo auction data | .xml (24 KB) .gz (6 KB) .xmi (5 KB) | 342 | 0 | 5 | 3.76608 |
DBLP Computer Science Bibliography
|
|||||||
The DBLP server provides bibliographic information on major computer science journals and proceedings. DBLP stands for Digital Bibliography Library Project. | |||||||
from DBLP Homepage | Oct 2002 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
dblp.xml | dtd | DBLP Bibliography | .xml (127 MB) .gz (23 MB) .xmi (19 MB) | 3332130 | 404276 | 6 | 2.90228 |
University Courses
|
|||||||
Course data derived from university websites. | |||||||
from Anhai Doan | 1999 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
reed.xml | dtd | Courses from Reed College | .xml (277 KB) .gz (18 KB) .xmi (12 KB) | 10546 | 0 | 4 | 3.19979 |
uwm.xml | NA | Courses from UWM | .xml (2 MB) .gz (157 KB) .xmi (102 KB) | 66729 | 6 | 5 | 3.95243 |
wsu.xml | NA | Courses from WSU | .xml (1 MB) .gz (99 KB) .xmi (61 KB) | 74557 | 0 | 4 | 3.15787 |
Nasa
|
|||||||
Datasets converted from legacy flat-file format into XML and made available to the public. | |||||||
from GSFC/NASA XML Project | 2001 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
nasa.xml | NA | Astronomical Data | .xml (23 MB) .gz (3 MB) .xmi (2 MB) | 476646 | 56317 | 8 | 5.58314 |
SIGMOD Record
|
|||||||
Index of articles from SIGMOD Record | |||||||
from ACM SIGMOD Record in XML | 2001 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
SigmodRecord.xml | dtd | SIGMOD Record in XML | .xml (467 KB) .gz (79 KB) .xmi (56 KB) | 11526 | 3737 | 6 | 5.14107 |
TPC-H Relational Database Benchmark
|
|||||||
TPC-H Benchmark, 10 MB version, in XML form. Converted to XML by Zack Ives. | |||||||
from Transaction Processing Performance Council (TPC) | 2002 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
part.xml | NA | Parts | .xml (603 KB) .gz (70 KB) .xmi (45 KB) | 20001 | 1 | 3 | 2.8999 |
lineitem.xml | NA | Line items | .xml (30 MB) .gz (2 MB) .xmi (1 MB) | 1022976 | 1 | 3 | 2.94117 |
partsupp.xml | NA | Part/Supplier relationship | .xml (2 MB) .gz (311 KB) .xmi (236 KB) | 48001 | 1 | 3 | 2.8333 |
supplier.xml | NA | Supplier | .xml (28 KB) .gz (6 KB) .xmi (5 KB) | 801 | 1 | 3 | 2.87266 |
orders.xml | NA | Orders | .xml (5 MB) .gz (556 KB) .xmi (358 KB) | 150001 | 1 | 3 | 2.89999 |
nation.xml | NA | Nations | .xml (4 KB) .gz (1 KB) .xmi (1 KB) | 126 | 1 | 3 | 2.78571 |
region.xml | NA | Regions | .xml (787 B) .gz (373 B) .xmi (370 B) | 21 | 1 | 3 | 2.66667 |
customer.xml | NA | Customers | .xml (503 KB) .gz (101 KB) .xmi (76 KB) | 13501 | 1 | 3 | 2.88875 |
Treebank (partially encrypted)
|
|||||||
English sentences, tagged with parts of speech. The text nodes have been encrypted because they are copywritten text from the Wall Street Journal. Nevertheless, the deep recursive structure of this data makes it an interesting case for experiments. | |||||||
from University of Pennsylvania Treebank Project | added Nov 2002 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
treebank_e.xml | NA | Partially-encrypted treebank | .xml (82 MB) .gz (30 MB) .xmi (24 MB) | 2437666 | 1 | 36 | 7.87279 |
Mondial
|
|||||||
World geographic database integrated from the CIA World Factbook, the International Atlas, and the TERRA database among other sources. | |||||||
from Florid-Mondial Case Study | 2002 | ||||||
filename | DTD | Description | Download | elements | attributes | max-depth | avg-depth |
mondial-3.0.xml | dtd | DTD is available, but data is not valid. | .xml (1 MB) .gz (167 KB) .xmi (95 B) | 22423 | 47423 | 5 | 3.59274 |
Last Modified: