internetarchive-ts - v0.1.0
    Preparing search index...

    Class WaybackMachineBeta

    This class provides access to the WaybackMachine using the Wayback CDX API

    Index

    Constructors

    Properties

    accessKey?: string
    config: IaAuthConfig = {}
    cookies: IaCookieJar
    host: string
    protocol: string
    secretKey?: string
    url: string

    Methods

    • Beta

      Get most recent match, or match closest to specified timestamp

      Parameters

      • url: string

        URL to get the match for

      • Optionaltimestamp: string | Date

        Optional timestamp

      Returns Promise<WaybackAvailabilityData | undefined>

      If no timestamp is provided: closest match to current date; If timestamp is provided: closest match to timestamp; undefined if no match is found

    • Beta

      Get snapshot matches for given url

      Type Parameters

      Parameters

      • url: string

        url to search snapshots for. Can contain * wildcards

      • Optionaloptions: {
            collapse?: WaybackCdxCollapseOption | WaybackCdxCollapseOption[];
            fastLatest?: true;
            filter?:
                | `length:${string}`
                | `original:${string}`
                | `urlkey:${string}`
                | `timestamp:${string}`
                | `mimetype:${string}`
                | `statuscode:${string}`
                | `digest:${string}`
                | `!length:${string}`
                | `!original:${string}`
                | `!urlkey:${string}`
                | `!timestamp:${string}`
                | `!mimetype:${string}`
                | `!statuscode:${string}`
                | `!digest:${string}`
                | (
                    | `length:${string}`
                    | `original:${string}`
                    | `urlkey:${string}`
                    | `timestamp:${string}`
                    | `mimetype:${string}`
                    | `statuscode:${string}`
                    | `digest:${string}`
                    | `!length:${string}`
                    | `!original:${string}`
                    | `!urlkey:${string}`
                    | `!timestamp:${string}`
                    | `!mimetype:${string}`
                    | `!statuscode:${string}`
                    | `!digest:${string}`
                )[];
            fl?: T;
            from?: string
            | Date;
            lastSkipTimestamp?: true;
            limit?: number;
            matchType?: WaybackCdxMatchType;
            offset?: number;
            page?: number;
            pageSize?: number;
            resumeKey?: string;
            showDupeCount?: true;
            showNumPages?: boolean;
            showPagedIndex?: true;
            showResumeKey?: false;
            showSkipCount?: true;
            sort?: "reverse" | "regular" | "closest";
            to?: string | Date;
        }

        Options

        • Optionalcollapse?: WaybackCdxCollapseOption | WaybackCdxCollapseOption[]

          Collapses the results based on the field contents or substring of the field content. Add one or more collapse items by either specifying the field name or :N where N is a positive integer and specifies that the first N characters of the field value are used for collapsing

        • OptionalfastLatest?: true

          Advanced Option: Return some number of latest results for an exact match. This option and is faster than the standard last results search. The number of results is at least 1 so limit=-1 implies this setting. The number of results may be greater >1 when a secondary index format (such as ZipNum) is used, but is not guaranteed to return any more than 1 result. Combining this setting with limit= will ensure that no more than N last results.

        • Optionalfilter?:
              | `length:${string}`
              | `original:${string}`
              | `urlkey:${string}`
              | `timestamp:${string}`
              | `mimetype:${string}`
              | `statuscode:${string}`
              | `digest:${string}`
              | `!length:${string}`
              | `!original:${string}`
              | `!urlkey:${string}`
              | `!timestamp:${string}`
              | `!mimetype:${string}`
              | `!statuscode:${string}`
              | `!digest:${string}`
              | (
                  | `length:${string}`
                  | `original:${string}`
                  | `urlkey:${string}`
                  | `timestamp:${string}`
                  | `mimetype:${string}`
                  | `statuscode:${string}`
                  | `digest:${string}`
                  | `!length:${string}`
                  | `!original:${string}`
                  | `!urlkey:${string}`
                  | `!timestamp:${string}`
                  | `!mimetype:${string}`
                  | `!statuscode:${string}`
                  | `!digest:${string}`
              )[]

          It is possible to filter on a specific field or the entire CDX line (which is space delimited). Filtering by specific field is often simpler. Any number of filter params of the following form may be specified: filter=[!]field:regex may be specified. Optional: ! before the query inverts the match, that is, will return results that do NOT match the regex.

          "!statuscode:200" // Returns all items with a statuscode that is not 200
          
        • Optionalfl?: T

          List of fields to return, must be of type WaybackCdxField

          If not defined, all fields are returned.

        • Optionalfrom?: string | Date

          Filter by date. If defined, query will return items from the specified date NS later. Can be either a Date object or a 14-digit archive timestamp in the format yyyyMMddhhmmss

        • OptionallastSkipTimestamp?: true
        • Optionallimit?: number

          Return the first n results. Use -N to return the last n results

          10 // return the first 10 results.
          
          -10 // return the last 10 results. The query may be slow as it begins reading from the beginning of the search space and skips all but last N results.
          
        • OptionalmatchType?: WaybackCdxMatchType
          "exact"
          
        • Optionaloffset?: number

          The offset= M param can be used in conjunction with limit to skip the first M records. This allows for a simple way to scroll through the results. However, the offset/limit model does not scale well to large querties since the CDX server must read and skip through the number of results specified by offset, so the CDX server begins reading at the beginning every time.

        • Optionalpage?: number

          Page number If pagination is not supported, the CDX server will return a 400.

        • OptionalpageSize?: number

          It is possible to adjust the page size to a smaller value than the default by setting the pageSize=P where 1 <= P <= default page size.

        • OptionalresumeKey?: string

          Resume key

        • OptionalshowDupeCount?: true

          It is possible to track how many CDX lines were skipped due to Filtering and Collapsing by adding the special skipcount counter with showSkipCount = true.

          An optional endtimestamp count can also be used to print the timestamp of the last capture by adding lastSkipTimestamp=true

        • OptionalshowNumPages?: boolean

          To determine the number of pages, add the showNumPages=true param. This is a special query that will return a single number indicating the number of pages

        • OptionalshowPagedIndex?: true

          It is also possible to have the CDX server return the raw secondary index, by specifying showPagedIndex=true. This query returns the secondary index instead of the CDX results and may be subject to access restrictions.

        • OptionalshowResumeKey?: false

          Show resume Key

        • OptionalshowSkipCount?: true
        • Optionalsort?: "reverse" | "regular" | "closest"

          Sort type

        • Optionalto?: string | Date

          Filter by date. If defined, query will return items up until the specified date. Can be either a Date object or a 14-digit archive timestamp in the format yyyyMMddhhmmss

      Returns Promise<
          {
              [K in string
              | number
              | symbol]: (
                  (
                      T extends WaybackCdxField[]
                          ? Pick<WaybackSnapshotInfo, T<T>[number]>
                          : WaybackSnapshotInfo
                  ) & {}
              )[K]
          }[],
      >

      Results as an array of result objects

      // Get matches from 2020-01-01 to 2020-01-11, collapsed to one match per day
      const matches = await waybackMachine.getSnapshotMatches("https://twitter.com/internetarchive/", {
      collapse: "timestamp:8", // Return one match per day
      from: new Date("2020-01-01"),
      to: new Date("2020-01-11")
      });
    • Beta

      Get snapshot matches for given url

      Type Parameters

      Parameters

      • url: string

        url to search snapshots for. Can contain * wildcards

      • Optionaloptions: {
            collapse?: WaybackCdxCollapseOption | WaybackCdxCollapseOption[];
            fastLatest?: true;
            filter?:
                | `length:${string}`
                | `original:${string}`
                | `urlkey:${string}`
                | `timestamp:${string}`
                | `mimetype:${string}`
                | `statuscode:${string}`
                | `digest:${string}`
                | `!length:${string}`
                | `!original:${string}`
                | `!urlkey:${string}`
                | `!timestamp:${string}`
                | `!mimetype:${string}`
                | `!statuscode:${string}`
                | `!digest:${string}`
                | (
                    | `length:${string}`
                    | `original:${string}`
                    | `urlkey:${string}`
                    | `timestamp:${string}`
                    | `mimetype:${string}`
                    | `statuscode:${string}`
                    | `digest:${string}`
                    | `!length:${string}`
                    | `!original:${string}`
                    | `!urlkey:${string}`
                    | `!timestamp:${string}`
                    | `!mimetype:${string}`
                    | `!statuscode:${string}`
                    | `!digest:${string}`
                )[];
            fl?: T;
            from?: string
            | Date;
            lastSkipTimestamp?: true;
            limit?: number;
            matchType?: WaybackCdxMatchType;
            offset?: number;
            page?: number;
            pageSize?: number;
            resumeKey?: string;
            showDupeCount?: true;
            showNumPages?: boolean;
            showPagedIndex?: true;
            showResumeKey: true;
            showSkipCount?: true;
            sort?: "reverse" | "regular" | "closest";
            to?: string | Date;
        }

        Options

        • Optionalcollapse?: WaybackCdxCollapseOption | WaybackCdxCollapseOption[]

          Collapses the results based on the field contents or substring of the field content. Add one or more collapse items by either specifying the field name or :N where N is a positive integer and specifies that the first N characters of the field value are used for collapsing

        • OptionalfastLatest?: true

          Advanced Option: Return some number of latest results for an exact match. This option and is faster than the standard last results search. The number of results is at least 1 so limit=-1 implies this setting. The number of results may be greater >1 when a secondary index format (such as ZipNum) is used, but is not guaranteed to return any more than 1 result. Combining this setting with limit= will ensure that no more than N last results.

        • Optionalfilter?:
              | `length:${string}`
              | `original:${string}`
              | `urlkey:${string}`
              | `timestamp:${string}`
              | `mimetype:${string}`
              | `statuscode:${string}`
              | `digest:${string}`
              | `!length:${string}`
              | `!original:${string}`
              | `!urlkey:${string}`
              | `!timestamp:${string}`
              | `!mimetype:${string}`
              | `!statuscode:${string}`
              | `!digest:${string}`
              | (
                  | `length:${string}`
                  | `original:${string}`
                  | `urlkey:${string}`
                  | `timestamp:${string}`
                  | `mimetype:${string}`
                  | `statuscode:${string}`
                  | `digest:${string}`
                  | `!length:${string}`
                  | `!original:${string}`
                  | `!urlkey:${string}`
                  | `!timestamp:${string}`
                  | `!mimetype:${string}`
                  | `!statuscode:${string}`
                  | `!digest:${string}`
              )[]

          It is possible to filter on a specific field or the entire CDX line (which is space delimited). Filtering by specific field is often simpler. Any number of filter params of the following form may be specified: filter=[!]field:regex may be specified. Optional: ! before the query inverts the match, that is, will return results that do NOT match the regex.

          "!statuscode:200" // Returns all items with a statuscode that is not 200
          
        • Optionalfl?: T

          List of fields to return, must be of type WaybackCdxField

          If not defined, all fields are returned.

        • Optionalfrom?: string | Date

          Filter by date. If defined, query will return items from the specified date NS later. Can be either a Date object or a 14-digit archive timestamp in the format yyyyMMddhhmmss

        • OptionallastSkipTimestamp?: true
        • Optionallimit?: number

          Return the first n results. Use -N to return the last n results

          10 // return the first 10 results.
          
          -10 // return the last 10 results. The query may be slow as it begins reading from the beginning of the search space and skips all but last N results.
          
        • OptionalmatchType?: WaybackCdxMatchType
          "exact"
          
        • Optionaloffset?: number

          The offset= M param can be used in conjunction with limit to skip the first M records. This allows for a simple way to scroll through the results. However, the offset/limit model does not scale well to large querties since the CDX server must read and skip through the number of results specified by offset, so the CDX server begins reading at the beginning every time.

        • Optionalpage?: number

          Page number If pagination is not supported, the CDX server will return a 400.

        • OptionalpageSize?: number

          It is possible to adjust the page size to a smaller value than the default by setting the pageSize=P where 1 <= P <= default page size.

        • OptionalresumeKey?: string

          Resume key

        • OptionalshowDupeCount?: true

          It is possible to track how many CDX lines were skipped due to Filtering and Collapsing by adding the special skipcount counter with showSkipCount = true.

          An optional endtimestamp count can also be used to print the timestamp of the last capture by adding lastSkipTimestamp=true

        • OptionalshowNumPages?: boolean

          To determine the number of pages, add the showNumPages=true param. This is a special query that will return a single number indicating the number of pages

        • OptionalshowPagedIndex?: true

          It is also possible to have the CDX server return the raw secondary index, by specifying showPagedIndex=true. This query returns the secondary index instead of the CDX results and may be subject to access restrictions.

        • showResumeKey: true

          Show resume Key

        • OptionalshowSkipCount?: true
        • Optionalsort?: "reverse" | "regular" | "closest"

          Sort type

        • Optionalto?: string | Date

          Filter by date. If defined, query will return items up until the specified date. Can be either a Date object or a 14-digit archive timestamp in the format yyyyMMddhhmmss

      Returns Promise<
          {
              result: {
                  [K in string
                  | number
                  | symbol]: (
                      (
                          T extends WaybackCdxField[]
                              ? Pick<WaybackSnapshotInfo, T<T>[number]>
                              : WaybackSnapshotInfo
                      ) & {}
                  )[K]
              }[];
              resumeKey?: string;
          },
      >

      Results as an array of result objects

      // Get matches from 2020-01-01 to 2020-01-11, collapsed to one match per day
      const matches = await waybackMachine.getSnapshotMatches("https://twitter.com/internetarchive/", {
      collapse: "timestamp:8", // Return one match per day
      from: new Date("2020-01-01"),
      to: new Date("2020-01-11")
      });
    • Beta

      Returns an AsyncGenerator that yields single snapshot matches. The generator will go through all available matches. This function makes use of the resumeKey

      Type Parameters

      Parameters

      • url: string

        url to search snapshots for. Can contain * wildcards

      • Optionaloptions: WaybackCdxOptions<T>

        Options

      Returns AsyncGenerator<
          {
              [K in string
              | number
              | symbol]: (
                  (
                      T extends WaybackCdxField[]
                          ? Pick<WaybackSnapshotInfo, T<T>[number]>
                          : WaybackSnapshotInfo
                  ) & {}
              )[K]
          },
      >